Scraping Kasada-protected websites
How to by pass Kasada?
During the past weeks, I’ve chatted with Dimitar Krastev of Crawlio about his recent discoveries about Kasada anti-bot software, summarized in this Linkedin post.
These notes integrate what we noticed in our previous articles, like our Anti-Detect Anti-Bot matrix or our latest post about the Bright Data Web Unblocker.
In this post, we’ll recap how we can scrape Kasada-protected websites using free and commercial tools. All the code can be found in our GitHub repository below.
GitHub Free Readers repository
What is Kasada and how it works?
Kasada is one of the newest players in the anti-bot solutions market and has some peculiar features that make it different.
You cannot identify a Kasada-protected website from Wappalyzer (probably the userbase is not so wide) but the typical behavior when browsing them is the following.
First of all, Kasada doesn’t throw any challenge in form of Captchas but the very first request to the website returns a 429 error. If this challenge is solved correctly, then the page reloads and you are redirected to the original website.
This is basically what they call on their website the Zero-Trust philosophy.
Kasada assumes all requests are guilty until proven innocent by inspecting traffic for automation with a process that’s invisible to humans.
Automated threats are detected on the first request, without ever letting requests into your infrastructure, including new bots never seen before.
To bypass these anti-bots there are usually two ways:
- reverse engineering the JS challenges, deobfuscating the code
- make our scrapers indistinguishable from humans browsing the web
Since the first option, while it’s great for understanding what’s happening under the hood, it’s time-consuming and does work only until the code is not changed, and always from their website, it seems that this happens frequently.
Kasada is always changing to maximize difficulty and enable long-term efficacy.
Our own proprietary interpretive language runs within the browser to deter attackers from deciphering client-side code. Resilient obfuscation levels the playing field by shifting the skillset from easy to reverse engineer JS to our own polymorphic method.
Obfuscated code and the detection logic within are randomized and change dynamically to nullify prior learnings. All communications between the client and Kasada are encrypted.
So, given that we need to make our scrapers more human-like as possible, let’s see what we can do to bypass it.