How Real-Time Airport Scraping Unlocked Scalable Flight Delay Verification for a Passenger Rights Platform

See how GroupBWT helped a legal-tech platform verify flight delays across 15+ EU airports by building a real-time tracking system based on direct airport data scraping.

single cases background

The Client Story

A global platform specializing in air travel disruption claims needed real-time access to verified flight status data to determine compensation eligibility under EU regulation. Third-party data providers failed to deliver complete coverage, missing 20–30% of flights, especially from regional and secondary airports, resulting in unresolved cases, slower legal processing, and lost revenue.

Manual website checks weren’t scalable, while vendor feeds lacked the granularity and timestamp precision required for compliant claims. To close this critical verification gap, the company sought direct, high-frequency scraping of live departure and arrival boards across EU airports, ensuring structured, timestamped flight delay tracking at the source.

Industry: Travel
Cooperation: Since 2025
Location: Worlwide

We used to rely on slow, fragmented flight data—some from outdated vendor feeds, some from manual checks. Verifying delays meant switching tabs, cross-referencing records, and hoping it matched the airport. Now, we know within minutes what happened—and when.

The real shift wasn’t just data speed. It was data integrity. Having airport-sourced records with timestamped status changes gave our legal team confidence to escalate claims faster, without ambiguity.

The Challenge

When Third-Party Data Breaks the Legal Chain of Proof

A request came in to track flights every 15 minutes across major EU airports, starting with Amsterdam, Barcelona, and Bucharest.

The goal wasn’t broad scraping. It was structured monitoring:

  • Departure airport, arrival airport
  • Scheduled and updated times
  • ​​Flight date (day of scraping)
  • Flight number, airline IATA code
  • Status changes (boarding, gate closed, cancelled, etc.)
  • ​​Time of scraping, time of status change
  • Gate and aircraft type, if published
  • Codeshare records

Each record needed to reflect the moment a passenger-facing change occurred and be logged in a schema that legal teams could use as admissible evidence.

The project expanded to 15+ airports. The client evaluated several vendors. What tipped the decision in our favor was not the technical pitch, but the ability to deliver on the actual conditions of the claim lifecycle: timing accuracy, audit compatibility, and adaptability under UI changes or site protection.

The Solution

A Real-Time, Modular Airport Flight Tracking Architecture

The system recorded departures and arrivals from each target airport, every 15 minutes. This created a rolling, timestamped dataset reflecting changes as they occurred.

Each airport presented different technical constraints. Some rendered content server-side in HTML tables, while others used JavaScript frameworks behind session protections. The system was built to detect and adjust for each layout type.

Schema Structure Designed for Case Integration

Each record matched the client’s legal framework:

  • Airport identifiers (departure, arrival)
  • Airline code and flight number
  • ​​Scheduled vs. actual time
  • Status as string
  • Time of scrape
  • ​​Time of status change (where present)
  • Gate number and aircraft type, when exposed
  • Codeshare flights

This data was stored in a format immediately readable by internal systems, not for visual display, but for case evidence.

The Solution

Compliance wasn’t optional. The system was designed from the ground up to respect access protocols, avoid site disruption, and fit our internal audit criteria. We didn’t need a workaround—we needed reliability. And we got it.

avatar
Alex Yudin
Web Scraping Team Lead
The Solution

Record Creation Triggered by Status Change

Rather than collecting every visible listing at every interval, the system logged a new entry only when the flight status changed.

For example:

  • 12:15: Flight marked “Open for boarding”
  • 12:45: Status updates to “Gate closed”
  • ​​13:10: Status changes to “Cancelled”

This would yield three distinct records—each logged with time, status, and airport context. The result was a machine-verifiable timeline, not a static snapshot.

The Solution

Site Defense Compatibility

Many airport pages were protected with Cloudflare, JavaScript rendering engines, or token-based session handling.

The system managed this through:

  • Proxy rotation
  • Session tracking
  • Real-time detection of DOM structure changes
  • Configurable alerting if the layout drift occurs

This allowed the client to monitor sites without causing performance issues or triggering blocks.

The Results

Verified Flight Status Tracking Outcomes That Support Legal Decision-Making

Operational Improvements

  • Faster legal case resolution: Structured updates made claim validation immediate, reducing rework and approval delays.
  • Analyst time reclaimed: Manual rechecking was no longer needed; attention shifted to escalation-only cases.
  • Higher evidentiary quality: Structured event chains backed by airport data lowered rejection rates and increased compliance confidence.
  • No post-processing required: Schema-aligned output matched internal intake systems directly.
  • Airport expansion without redesign: Each new site required only structural configuration—no rebuild or logic rewrite.
The Results

What Enabled This Stability

  • Airport-specific scraping logic—not generic libraries
  • All data pathways mapped, timed, and traceable
  • No dependence on third-party vendor uptime or API licensing
  • Optional source code handoff or managed service configuration
  • Monitoring, pricing, and maintenance structured for predictability

Source-level tracking becomes essential when claim eligibility depends on when, how, and where a flight status was published. This system established control over the verification process, not through volume, but through visibility, timing, and format integrity.

The Results

How Ongoing Support and Scaling Were Structured

Depending on internal capacity, the client chose between source-code ownership and full-service deployment.

The system was configured for long-term use with minimal rework. Support covered change requests, monitoring, and adaptive reconfiguration in case airport pages were updated.

Clients could either host internally or request a managed setup, with source-specific billing and predictable maintenance terms. Adding new endpoints required only structural mapping—no system redesign.

The Results

What Can Replace Third-Party Flight Data for Claims Verification?

By switching from vendor feeds to direct airport scraping, the platform gained a complete, timestamped record of each flight’s status lifecycle—boarding, delay, and cancellation—mapped to schema-aligned formats ready for legal intake and regulatory action.

95–100%
flight records verified                   
<15 min
legal logging latency                 
13h saved
per analyst weekly

Ready to discuss your idea?

Our team of experts will find and implement the best eCommerce solution for your business. Drop us a line, and we will be back to you within 12 hours.

Contact Us