Painting a Picture with Web Scraping
Mar 11, 2019
Gone are the days of conducting business exclusively through brick and mortar establishments. Online retail through company websites is no longer cutting edge, and businesses and individuals are increasingly using social media to advertise, promote, and sell their products and services. Information such as location, size, income sources, etc. can be mined from these sites. The ability to collect public information from social media, integrate it with agency data, and perform peer and network analysis can be a powerful tool for revenue departments to more efficiently resolve compliance cases and increase revenue collection.
Imagine a situation where a restaurant is reporting what looks to be a reasonable amount of sales for a small business with 5 employees. However, a search on Open Table and Yelp reveals that they are open 7 days a week, for 12 hours a day, have a party room and sell alcohol. In addition, mapping the address listed in these social media sites shows that the restaurant is physically large and located in a busy, commercial part of town. Agency data alone wouldn’t highlight these discrepancies for an auditor, but by ingesting social media data and joining it to return and agency data, an auditor can easily perform analysis to look at revenue, gross sales, number of W-2s, and withholding amounts for other restaurants, with similar hours, similar menu and pricing structure, and can even control for location.With this ability, anomalies that would have been previously hidden can be brought to light and used to more expeditiously bring compliance. If tax forms can be thought of as the narrative that describes how people and business make and spend their money over the year, then the addition of social media information allows revenue agencies to more easily, vividly, and accurately label those narratives as fiction or non-fiction.
For more information about tools that can help scan, extract, analyze and report check out the RevHub Social Scanner.