Data Augmentation For Marketing Campaign
Project
Those who run online marketing campaigns know – the bigger the database of the potential clients, the greater the chances to sell the lead. The more accurate the list of the contacts is, the fuller information it contains – the greater the chances that the target person actually opens your email.
Thus, data enrichment is an indispensable process in this case, and experienced marketer always recourse to the data mining firms like us.
The goal of this project was to enrich the database of 3 million email addresses with their owners’ information. Data augmentation is one of the most popular tasks in data processing for marketing and advertising. Email addresses without owner details are not enough to create high-quality marketing newsletter.
Challenges
We have used Google to find and pull additional info for data augmentation. When making Google search with a corporate email, in the majority of cases owner details are shown in one of the first three organic snippets. This is where we were grabbing the data from. If personal details are missing in the first three results, we can go to the webpage and scan it trying to find personal details there. To pass irrelevant snippets, we have developed an intellectual system of filters with blacklisted and whitelisted resources.
Another issue is that Google is doing its best to fight scraping, thus blocks abusive IP addresses after a certain number of requests. To resolve this issue, we have used our proprietary system that collects free proxies from the Internet (which in most cases helps our clients to save a significant amount of money, as they don’t have to use paid proxy services). Using proxies, we make Google think that the queries are sent from multiple spots and in case a proxy is banned, it’s getting replaced with the other one to ensure continuous scraping process.
Due to a fact that proxy servers require the delays between the requests, we’ve decided to speed the process up by running the search in simultaneous threads (up to 800 hundred).
3
million email
92%
data accuracy
Solution
We have successfully enriched the database of 3 million email addresses within the shortest period of time, reaching up to 92% of data accuracy.

Similar Projects

ERP System for a healthcare organization
This project explains how our team helped a medium-sized mental health provider develop a custom-built ERP (Enterprise Resource Planning) Software for a healthcare organization.

Web scraping for online cosmetics marketplace
Extracting unstructured data from PDF files for further sentiment analysis performed with a help of Natural Language Processing – NLP for short.
Schedule a meeting
We’ll invite you to join us in teleconference at the time you pick
Describe your project
We will calculate its cost shortly and get back to you with the development plan
Chat with our manager
Use the chat window at the right side of your screen
Recent Comments