Hands On #1: Testing the Bright Data Web Unblocker proxy
What is Bright Data Web Unblocker
Digging a bit more into Bright Data’s website we can understand better how this works. Directly from the product page:
Limits requests per IP
Manage IP usage rates so you don’t ask for a suspicious amount of data from any one IP
Emulates a real user
Automated user emulation including: starting on the target’s homepage, clicking their links, & making human mouse movements
Imitates the right devices
Web Unlocker emulates the right devices that servers expect to see
Calibrates referrer header
Makes sure the target website sees that you are landing on their page from a popular website
Honeypots are links that sites use to expose your crawlers. Automatically detect them and avoid their trap
Sets intervals between requests
Automated delays are randomly set between requests
All these features can be summed up with the following picture.
It seems a good solution and easy to integrate into our scrapers since it’s basically like adding a proxy to them.
Our testing methodology
To test this kind of product I’ve developed a plain Scrapy spider that retrieves 10 pages from 5 different websites, one per each anti-bot solution tested (Datadome, Cloudflare, Kasada, F5, PerimeterX). It returns the HTTP status code, a string from the page (needed to check if the page was loaded correctly), the website, and the anti-bot names.
The base scraper cannot retrieve correctly any of the records and this will be our benchmark result.
As a result of the test, we’ll assign a score from 0 to 100, depending on how many URLs are retrieved correctly on two runs, one in a local environment and the other one from a server. A score of 100 means that the anti-bot was bypassed for every URL given in input in both tests, while our starting scraper has a score of 0 since it could not avoid an anti-bot for any of the records.
You can find the code of the test scraper in our GitHub repository open to all our readers.
Continue reading on the newsletter