It goes without saying that we live in a world where data is more valuable than oil, we like to think that most of this data is locked in the secure servers of the many private companies that collect it. Anyone who hasn’t been sleeping for the past decade also realizes that those servers are not immune from attack. We’ve seen big breaches including: Target, Home Depot, and Equifax, among others. We also ‘give’ our data away in return for advertisers to target their ads on the platforms that have become a part of our daily life. With the rise of all of this data, there are also data brokers who provide your data, as a service, to whoever wants it. This is nothing new. Back in the day there were big yellow phone books, 411, directories, public records offices etc. where a skilled sleuth could gather information. Nowadays it’s just much easier, and can all be done from the computer.
In fact, doing some research into my towns history I uncovered the early 20th century version of status updates which were posted in the newspaper. They provide one or two sentence breakdowns of the major happenings around town. This way the whole town knows that Elston Slater hit himself so hard in the face while cutting wood that his teeth became loose.
As an exercise, I wanted to see how much of this data I can harvest on the residents of my small, New Jersey town. I don’t know every mailing address in my town, but luckily the United States Department of Transportation released the NAD (National Address Database) which can be used to get all of those postal addresses. Next, I put together a basic scraper to reverse address search the addresses, gather all the information from the data broker, and put it into an SQL database. I will not mention the specifics of this project or post any related code, because I do not want to promote malicious use (e.g. doxxing) of the data by bad actors. My purpose in this article is to present what can be achieved with no money, an afternoon, and some basic coding knowledge.
I was able to obtain 27,555 records, on 25,212 past and present residents of my town. To put this into perspective, there were an estimated 9,356 residents in 2019. Although the quality of the data is not optimal (a lot of @ altavista emails and other data that looks outdated or estimated) I was able to obtain the length a person lived at a certain address, possible IP addresses, how many people lived in the home, education level, occupations, names, and phone numbers.
It’s important to note that you can opt-out of being listed on these data brokers sites and there are services that will do this for you (for a fee) such as privacyduck.