Price Comparison Website And Custom Scraper
How many times have you been in the situation when you’re up to buying something online, but just want to wait for the reasonable price? Therefore, constantly refreshing your favorite product page several times a day waiting for a miracle.
What if there is a need to monitor hundreds of products? How time-consuming and annoying can that process be? One of our clients has had enough of this routine job, thus decided to build a system capable of doing everything automatically, while notifying him of the price changes via email.
Long story short, the goal was to create a web application to compare product prices in multiple US shops and notify end users when prices go down. Here are some examples of the websites that the client was interested in: 6pm.com, amazon.com, bestbuy.com, ebay.com, homedepot.com, macys.com, microsoft.com/en-us/store/b/home, newegg.com, rakuten.com, walmart.com. And around 20-30 smaller ones.
— Determine accurate and relevant product price.
May sound obvious, but every online shop is different. Although they can sell the same products, some core details may vary. The product price can be different, some shops give discounts directly on the product page, some provide a discount only after the product has been placed in the shopping cart. Alternatively, the user could have found a coupon for the specific shop somewhere externally and wants to use it to purchase the product online. Moreover, each shop charges different amounts for shipping and this price can change day to day. The main challenge was to foresee all nuances to ensure the accuracy of the system that we were to develop.
— An extensive amount of data that has to be processed. The end product should be scalable enough to guarantee continuous workflow despite the number of people using the system simultaneously. The task for us, in this case, is to ensure the uptime no matter what, secure the users from being banned, as well as make sure that the parsing process doesn’t take ages.
So, it has been decided to perform the parsing with multiple threads with the help of PHP PCNTL Functions. Scraper uses proxy and time delay between queries to avoid a ban.
developers worked on the project
months to complete
A user loads the list of URLs into CSV file. The system loads them into the database and links shops and products so that one product may be assigned to multiple shops. It allows viewing of product prices at multiple shops at the same time. After that, the system scrapes product data: price, title, category, shipping cost, product availability.
In the backend, a user can schedule the parsing process and setup timeout delay for every hour of work. For instance, at night timeout delay is longer (and scraping speed is lower, accordingly) and during the day the speed is max.
The system watches price changes and notifies a user when the price goes down. Each product price goes through the following process:
- adding the cost for the shipment, considering the user’s location.
- checking the price against external coupons (added by a user before)
- re-calculating price for hidden discounts (some shops give a discount when a product is added to the shopping cart).
When the price goes down to the amount previously set by a user, the product is added to a section for the most attractive deals and prices, where a user can monitor them. So even the price hasn’t changed but shipping cost changed, a user will be notified.
Let us know if you require something like that, or take a look at the similar price monitoring solutions we’ve created.
Schedule a meeting
We’ll invite you to join us in teleconference at the time you pick
Describe your project
We will calculate its cost shortly and get back to you with the development plan
Chat with our manager
Use the chat window at the right side of your screen