How Web Scraping
Drives Data-Driven
Telecom Market
Research

single blog background
 author`s image

Oleg Boyko

Introduction

In the telecommunications market, competition is rapidly intensifying due to the emergence of alternative internet connection options such as fixed wireless access (FWA), satellite internet, and fiber-optic networks. According to Deloitte, this is giving consumers more choices and forcing traditional operators to rethink their strategies to attract new customers and retain existing ones.

In particular, the growing adoption of FWA and satellite internet is a key driver of competition in the industry, improving service quality and lowering costs. As households, particularly in Europe and the U.S., gain access to a wider range of connection options, companies are increasingly using web scraping tools to track market changes and analyze data on network coverage. 

In this article, we explore how web scraping can help telecom companies collect up-to-date information on 1Gb internet coverage. Using the German market as an example, we show how this data allows companies to make strategic decisions based on accurate insights.

Why Is Data Needed in Telecom Market Research?

Market research is the process that helps companies gain deeper insights into the market, customer needs, and competitor actions. During research, data is collected, trends are analyzed, and opportunities for growth are identified. As a result, companies can more accurately forecast demand, develop effective promotional strategies, and make decisions that strengthen their competitiveness. In today’s fast-moving information age, marketing research has become more complex, requiring the use of advanced technologies for data collection and processing.

One such technology is web scraping, which automates the extraction of data from multiple web resources. In the context of marketing research, web scraping is especially useful as it allows companies to quickly obtain information that would require significant resources using other methods.

For example, companies can collect data on prices, reviews, competitor offerings, and network coverage. This is particularly important in the telecommunications sector, where companies are eager to know about the quality and availability of high-speed internet, such as 1 Gb, in various regions. In our example, such data helps operators determine where there is growth potential, where to focus efforts on improving infrastructure, or how to adjust marketing campaigns for regions with high demand. Web scraping speeds up and increases the accuracy of this process, providing companies with a competitive advantage.
Marketing data is essential for telecom companies, which use it to analyze customer needs and market trends and offer effective strategies.

Data Sources for Telecom Market Research

To better understand the market, we selected Deutsche Glasfaser and Telekom, the two largest providers in Germany.  These companies are the largest players in the telecommunications services market, offering both mobile and fiber-optic internet services. Both providers have a wide customer base and extensive geographical coverage, making them the most relevant for analyzing the current state of internet services in Germany.

On their websites, it is possible to check the availability of 1 Gb internet at a specific address. This process involves manually filling out a form on the website, after which one of three possible statuses is displayed:

  • Available
  • Not available
  • Availability planned for a future date (including the time frame for when access will become available)



Such a check is effective for individual addresses, but if it is necessary to analyze the entire market, we are talking about tens of millions of checks. Doing this manually is impossible, so automation is required to efficiently gather and process the necessary data.

Approach to Data Collection

The first step in data collection begins with checking the website for scraping possibilities. We assess how quickly the data can be extracted, determine the presence of scraping protections, and develop a strategy to bypass these protections. In the case of Deutsche Glasfaser and Telekom, no protections were found on the websites, making the process relatively simple and straightforward. Even if protection were present, this would not have been a barrier to data collection, as there are methods for circumventing various protection mechanisms.

Deutsche Glasfaser

To check internet availability, we need to verify each address on the list individually. After studying the data structure, we discovered how the site encrypts addresses using a specific code, like this: 64291ABD-1-.

In this format:

  • 64291 is the postal code,
  • ABD is the city abbreviation,
  • -1- is the house number.



Here’s how it works:

1. First, we collect the unique identifiers for all cities listed on the website.

2. Next, we gather the street identifiers for each city.

3. We then generate a unique house ID using the format {street_id}-{house_number}-.

4. With this house ID, we send an API request to get detailed information for each address.

5. Finally, we analyze the results to see if each address has internet coverage or not.

This approach ensures we manage the data efficiently and get accurate results.

Telekom

While working with this site, we noticed a small difference in how Telekom handles data. The site uses POST requests to check internet availability, which send additional information to the server, unlike GET requests that simply retrieve data. 

  • The possible statuses returned for each address are: 
  • The address is within the coverage area. 
  • The address is outside the coverage area. Coverage is planned for the future, with a specified time range (e.g., from 2026 to 2027).



This distinction required us to adjust our approach by including the additional parameters in the POST requests to ensure accurate results. As part of this data collection strategy, we:

1. Extract the CSRF token from the page to execute requests.

2. Generate requests for each address.

3. Automate the CSRF token update as it expires.

4. Execute requests to obtain coverage data.

General Strategy for Both Sites:

  • Automation: All steps are automated through scripts based on Scrapy. This allows checking millions of addresses without the need for manual data input.
  • Proxy Use: Rotating proxies are used to optimize the process, distributing requests and avoiding blocks.
  • Cost Calculation: Each request consumes a certain amount of traffic, and based on this, the cost of data collection can be accurately calculated depending on the number of addresses.

Data Delivery Methods

Automation and efficient data collection strategies significantly reduce manual input and optimize the process, especially when it comes to analyzing large-scale datasets, such as millions of addresses. 

When it comes to data delivery, it is important to understand the key stages of the process and how they impact overall costs and timeframes. Below, we outline how data collection occurs, from scraper development to maintaining the process for large-scale data.

Development Phase: The first step is developing the scraper, a tool that automatically gathers information from the required web resource. If a new data source is required, a separate scraper is developed, ensuring maximum flexibility and adaptability to any requests. After development, thorough testing is conducted to ensure the tool works correctly and can collect the necessary data.

Minimal Changes in Cost as Data Volume Increases: A notable aspect of working with large volumes of data is that whether you need to collect 1 million records or 30 million, the final cost will differ only slightly. This is because the main costs are associated with scraper development, while restarting the scraper remains relatively inexpensive, even as the volume of data increases.

Once the scraper is ready and tested, it can be set up for periodic data collection. When automation is needed, the scraper runs at set intervals, ensuring regular updates without additional effort. The main advantage is that a once-developed scraper can be reused indefinitely, significantly reducing costs and speeding up the process.

Monitoring and Support: If the volume of data is large, requiring weeks or months for collection, additional time must be allocated for the work of data engineers. Their role is to monitor the scraping process, ensuring its smooth operation, and promptly make adjustments if there are changes in data structures or sources. This approach helps avoid downtime and ensures high accuracy and relevance of the collected data.

With the foundational steps in place for efficient data collection, the next crucial aspect to consider is how this data is delivered and utilized. In the following section, we will explore the formats and methodologies for delivering the collected data effectively.

Data Delivery Formats

When delivering data, it’s crucial to account for the fact that different clients may have varying internal processes for working with data. Understanding these differences allows us to provide the results in the most suitable formats or adapt them to specific needs. Flexibility in delivery ensures that the data seamlessly integrates into the client’s workflow, enhancing efficiency and usability.

 The main options are:

  • Simple Formats: Excel and CSV are the most commonly used formats for delivering data. They are convenient for analysis and work with tables, easily opened and edited in most programs.
  • API Access: Another delivery option is providing access via API (Application Programming Interface). This method allows clients to automatically retrieve data through requests to our server, which is especially convenient for integrating data into automated processes or systems.
  • Client System Integration: If the client has their own data processing system, we can set up a process where data is automatically uploaded directly into their system. This provides convenience and eliminates the need for additional data processing by the client.



By offering multiple data delivery formats, we ensure that clients can receive and use the data in the way that best fits their existing processes. Whether through simple files, APIs, or direct integration into internal systems, the goal is to provide flexibility and efficiency tailored to each client’s needs.

Proxy and Additional Expenses

Using proxies is an integral part of the data collection process, regardless of whether the site is protected or not. Proxies help mask our location and distribute requests, making data collection more efficient and secure. There are several reasons why proxies are necessary:

1. Geographic Accessibility: Some sites may be inaccessible to users from certain regions, so local proxies allow access to these sites.

2. Avoiding Blocks: Frequent or mass site access may trigger a site to suspect unnatural behavior and block access. Proxies help distribute the load and avoid blocks.

3. Additional Expenses: The use of proxy servers affects the overall cost of scraping. Depending on the volume of data and the frequency of requests, renting multiple proxies may be necessary, especially when working in different regions. These costs are included in the total cost of the data collection service.

4. Ethical Scraping: Ethical norms must be followed during data collection. One of the key points is the scraping speed. We must set the request speed in such a way as to not overload source websites and impair their performance. Violating this rule may result in blockages or slowing down the site, which is unethical and can negatively impact collaboration.

Conclusion

In the fast-evolving telecommunications market, where new technologies like FWA and satellite internet are intensifying competition, companies need real-time data to make informed strategic decisions. Web scraping serves as a vital tool for gathering large-scale data on network coverage, competitor offerings, and demand trends. By leveraging this approach, telecom companies can swiftly adjust their strategies, optimize infrastructure investments, and maintain a competitive edge.

GroupBWT stands as an expert in providing custom web scraping solutions that help telecom companies automate data collection, giving them a significant advantage. Our solutions are flexible, reliable, and cost-effective, enabling businesses to unlock the full potential of data-driven decision-making.

Contact us today today to learn how our tools can help your company stay ahead of the competition through precise analytics and optimized data solutions!

Ready to discuss your idea?

Our team of experts will find and implement the best Web Scraping solution for your business. Drop us a line, and we will be back to you within 12 hours.

Contact Us