Hey I'm Pierlugi and I write for The Web Scraping Club.
I'm the co-founder of Databoutique.com and want to share with you my 10+ years of experience with web scraping.

Featured Posts:

All Posts

Scraping E-Commerce websites 101

How to approach the web scraping of e-commerces before start coding, Is the data needed in PLP (product list page) or in the PDP (product detail page)?

Read more...

THE LAB #14: Scraping Cloudflare Protected Websites (early 2023 version)

Since it’s been a while since I’ve written about Cloudflare solutions and things do evolve rapidly in this industry, I’ve decided to update my old article about scraping Cloudflare-protected websites

Read more...

Web Scraping news recap - February 2023

Legal updates and new tools available in February 2023 for the web scraping industry

Read more...

Hands On #1: Testing the Bright Data Web Unblocker proxy

Is Bright Data Web Unblocker capable to defeat Datadome and other anti-bot solutions? We tested it in this post from The Web Scraping Club

Read more...

Interview #6: Aleksandras Šulženko - Oxylabs

We interviewed Aleksandras Šulženko from Oxylabs and we talked about proxies, AI and web scraping used for a better world.

Read more...

THE LAB #13: Managing a fleet of scrapers with Scrapeops

Web scraping has become an essential tool for many businesses and organizations to gather data from the internet. However, large web scraping projects pose unique challenges that require careful planning and execution.

Read more...

Scraping Kasada-protected websites

How to scrape Kasada with Playwright, Undetected Chromedriver and other tools, with code and examples, for free.

Read more...

Introducing the Web Scraping 101 Wiki

Just like for Wikipedia, all the knowledge about web scraping cannot be collected from a single author. The only way to achieve a decent coverage of all aspects of web scraping is to collaborate in a curated environment.

Read more...

THE LAB #12: Reverse-engineering Mobile API

Scrape data from apps using Charles Proxy and Android Emulator, a step-to-step guide to reverse app API

Read more...

Bypass Cloudflare with these web scraping tools

How to bypass the Cloudflare Browser Check with Python: let's see some setups to avoid error 403 while scraping

Read more...

Interview #5: Veritas - The anti obfuscation master

On de-obfuscation, hacking and bananas

Read more...

THE LAB #11: The Anti-Detect Anti-Bot matrix

An incomplete but still yes useful list of interesting resources on web scraping. Testing the most well-known web scraping tools in Python against Cloudflare, Kasada, PerimeterX, Datadome and Shape

Read more...

The January 2023 recap for the Web Scraping industry

My end of year remarks, the birth of The Web Scraping Club and more

Read more...

The most interesting GitHub Repositories about web scraping (2023)

An incomplete but still yes useful list of interesting resources on web scraping

Read more...

THE LAB #10: Bypass Cloudflare Bot Protection with GoLogin

A new way to scrape Cloudflare-protected website using antidetect browsers

Read more...

How I've built my home made mobile proxy

About CGNAT, SSH and costs of a mobile proxy made with Raspberry PI

Read more...

Interview #4: Martin Ganchev - Smartproxy

On proxies, web scraping and pets

Read more...

THE LAB #9: Scraping OpenSea NFT's data

Getting winners and losers of the Bored Ape Yacht Club collection transactions

Read more...

3 things + 1 to do before start coding your scraper

Before starting coding your scraper, a good target website analysis could save you a lot of time. The first thing is to check the tech stack of your target website

Read more...

Kasada: Wanted a parka and got an "Error 429: Too many requests"

In this post of The Web Scraping Club, we’ll see why some websites when we load them for the first time, throw a 429 error before starting to work.TLDR versions: It’s because of Kasada's anti-bot solution.

Read more...

From 0 to 2 Billion Prices scraped per months

In this post of The Web Scraping Club blog, I’ll write about what we did at Databoutique.com to scale from 0 to 2 Billion prices per month scraped, bootstrapped, and with a minimal team of developers.

Read more...

A brief August wrap up of the latest news on web scraping

A brief august wrap up of the latest news about web scraping from all around the world.

Read more...

The starter toolkit for a python web scraping developer (2022)

Web scraping, as we all know, it's a discipline that evolves over time, with more complex anti-bot countermeasures and new tools to use.Let's find together what tools can't be missed for a python web scraper developer.

Read more...

Is web scraping becoming harder?

Do you have the feeling that web scraping is becoming more difficult and expensive? I do, especially in the last 12 months, I've noticed an increasing number of websites using advanced anti-bot solutions

Read more...

THE LAB #1: Scraping data from an app

I usually write in this newsletter about how to extract data from websites but what if our target is an app with no web interface?

Read more...

The costs of web scraping

There's no doubt in stating that cloud computing enabled a wide range of new opportunities in the tech space, and this is true also for web scraping.

Read more...

THE LAB #2: scraping data from a website with Datadome and xsrf tokens

A real world use case of a simple scraper that does not get blocked by Datadome

Read more...

Interview #1: Neha Setia - Zyte

Welcome to the first of our interviews, we'll break the ice with Neha Setia (@nehasetianagpal), developer advocate at Zyte, where she conducts workshops and enablement sessions for system integrators and clients at events.

Read more...

What's a proxy server?

Straight from Wikipedia, "In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource".

Read more...

THE LAB #3: Scraping Cloudflare protected websites

Cloudflare is an American company, based in San Francisco, offering several services like DDoS mitigation services, Distributed DNS, Content Distribution Networks, and also anti-bot protection for websites.

Read more...

On choosing the right proxy provider for scraping

Today we'll have a brief follow-up of the previous post, where we talked about proxies, how they work, and their different types.

Read more...

Create your first python scraper with Scrapy

Scrapy is an open-source Python application framework designed for creating web scraping programs.

Read more...

THE LAB #4: Scrapyd - how to manage and schedule a fleet of scrapers

Web scraping is like eating cherries: one website pulls the other and you will soon find yourself with hundreds of sparse scrapers in your servers, scheduled via crontab randomly.

Read more...

The Kallax Index - Scraping Ikea websites

If you're at least a bit interested in economics, you've surely heard about the Big Mac Index by The Economist. We're doing the same but with the Kallax from IKEA.

Read more...

The Lab #5 - Scraping Airbnb.com using GraphQL

The travel industry has been one of the first to be impacted by digitalization. Booking.com, one of the largest websites for booking hotels around the globe, started its operations in 1997.

Read more...

Web Scraping News: October Monthly Recap

Meta settles a lawsuit against two companies scraping Facebook and Instagram data

Read more...

Interview #2: Neil Emeigh - Rayobyte

Welcome to our monthly interview, this time it’s the turn of Neil Emeigh, CEO at Rayobyte, a leading company in the proxy industry.

Read more...

THE LAB #6: Changing Ciphers in Scrapy to avoid bans by TLS Fingerprinting

TLS fingerprinting is a passive (or server-side) fingerprinting technique used by servers to identify the configuration of the clients connecting to it.

Read more...

Selenium vs Playwright, a comparison

In the Web Scraping industry, we've heard a lot of times about Selenium and Playwright when there's the need for a fully-headed scraper in Python (and of course Puppeteer for JS).

Read more...

The rise of antidetect browsers

As anti-bot and user profiling techniques are becoming more and more invasive, a new niche of browsers is born and they are called antidetect.

Read more...

THE LAB #7: Scraping PerimeterX protected websites

PerimeterX is one of the most well-known anti-bot solutions, used by some of the top-tier websites on the net. They recently merged with Human Security, another company in the anti-bot industry but more focused on fraud prevention

Read more...

Web Scraping News: November Monthly Recap

In September Jeremy Singer-Vine, a data journalist and computer programmer in New York started the Data Liberation Project, which aims to create datasets from public government data not easily accessible

Read more...

HTTP requests in Python explained

A basic introduction on how make HTTP requests with different python tools

Read more...

THE LAB #8: Using Bezier curves for human-like mouse movements

In computer graphics, connecting point A to point B, we use lines that can be categorized as straight or curved.

Read more...

Interview #3: Ondra Urban - Apify

Welcome to our monthly interview, this time it’s the turn of Ondra Urban, COO at Apify, a cloud platform that helps you build reliable scrapers, fast.

Read more...

Web Scraping experts: Is AI stealing our job?

For the few who went hiking without a phone in the past weeks and didn’t have the chance to scroll their feed on Linkedin, OpenAI released a new version of the GPT-3 model, called ChatGPT.

Read more...

The 2022 recap for the Web Scraping industry

2022 is closing and as usual, these last days are spent making a recap of what we achieved and what happened during the past year.

Read more...