Data Augmentation for Marketing Campaign - GroupBWT

Data Augmentation for marketing campaign

Massive data enrichment job implied multi-thread Google Search scraping and proxy rotating services to find the owners of about 3M email addresses.

data augmentation icon

Project

Those who run online marketing campaigns know – the bigger the database of the potential clients, the greater the chances to sell the lead. The more accurate the list of the contacts is, the fuller information it contains – the greater the chances that the target person actually opens your email.

Thus, data enrichment is an indispensable process in this case, and experienced marketer always recourse to the data mining firms like us.

The goal of this project was to enrich the database of 3 million email addresses with their owners' information. Data augmentation is one of the most popular tasks in data processing for marketing and advertising. Email addresses without owner details are not enough to create high-quality marketing newsletter.

Challenges


The fuller owner information is used in email, the better chances to get into mailbox and not be filtered out as SPAM. In case you have email owner's first name, last name and job title, you can create a personalized email, which produces the greater chances that the target person will be intrigued enough to open and read an email.

We have used Google to find and pull additional info for data augmentation. When making Google search with a corporate email, in the majority of cases owner details are shown in one of the first three organic snippets. This is where we were grabbing the data from. If personal details are missing in the first three results, we can go to the webpage and scan it trying to find personal details there. To pass irrelevant snippets, we have developed an intellectual system of filters with blacklisted and whitelisted resources.

Another issue is that Google is doing its best to fight scraping, thus blocks abusive IP addresses after a certain number of requests. To resolve this issue, we have used our proprietary system that collects free proxies from the Internet (which in most cases helps our clients to save a significant amount of money, as they don't have to use paid proxy services). Using proxies, we make Google think that the queries are sent from multiple spots and in case a proxy is banned, it’s getting replaced with the other one to ensure continuous scraping process.

Due to a fact that proxy servers require the delays between the requests, we’ve decided to speed the process up by running the search in simultaneous threads (up to 800 hundred).

3

million email

92%

data accuracy

Solution

We have successfully enriched the database of 3 million email addresses within the shortest period of time, reaching up to 92% of data accuracy.

Laptop with Gmail

Similar Projects

Schedule a meeting

We’ll invite you to join us in teleconference at the time you pick

Schedule

Describe your project

We will calculate its cost shortly and get back to you with the development plan

Write

Chat with our manager

Use the chat window at the right side of your screen