Cloudflare is a Content Delivery Network (CDN) and security service that helps protect websites from malicious traffic and uses DDOS protection. By the end of this post, you will better understand how to bypass Cloudflare protection.
Cloudflare bot management
Cloudflare is a security company that offers customers a Web Application Firewall (WAF) to defend against security threats such as credential stuffing, cross-site scripting (XSS), and DDoS attacks.
As part of popular web application firewalls (WAF), Cloudflare has a Bot Manager system and a bot protection solution to guard against malicious bots without affecting genuine users.
Cloudflare keeps an allowlist for known good bots, like Google and other search engines, but all other bots are assumed to be malicious and will likely be denied access to a Cloudflare-protected website.
Can Cloudflare be bypassed?
Despite Cloudflare’s security measures, bypassing Cloudflare protection and gaining access to Cloudflare-protected site content is possible. It is done by using a variety of techniques. For example, a Cloudflare challenge is a challenge-response system that only allows accessing the website with an accurate response, which can be bypassed by providing a proper response.
Here, it is necessary to understand that the challenge-response bypass is not a foolproof technique and can be detected by Cloudflare loads if the website administrator monitors the traffic
403, 201, 429, and 502 are the most common HTTP status codes that result when the Cloudflare bot blocks any request. Users can still bypass these Cloudflare errors as all these HTTP headers contain actual error codes and definitions, which can help determine the issue.
Cloudflare error 1020: access denied
“Access Denied” is among the most frequently encountered errors when attempting to scrape Cloudflare. Unfortunately, this does not provide any information as to the exact cause. To successfully bypass Cloudflare, 1020 is required.
Cloudflare error 1009: access denied: country or region banned
If you receive a Cloudflare Error 1009 message that states “… has banned the country or region of your IP address”, it is likely because the website origin IP is locked to certain countries. To get around this, you can use proxies from countries that can access the origin IP. For example, if the website is only available in the UK, you must use a UK proxy to bypass this error.
Cloudflare error 1015: you are rate-limited
It is essential to respect rate limits when web scraping, but these limits can be set very low. If your scraper is being rate-limited, it means that it is scraping too quickly. To avoid the Cloudflare 1015 issue, it is best to distribute the scraper traffic across multiple agents, such as proxies and browsers.
Cloudflare Error 1010: access denied: banned due to browser signature
Lastly, the Cloudflare challenge page (a browser check) assesses the client’s trustworthiness rather than indicating a block. It is best to increase the general trust rating to avoid having to solve the challenge.
In some cases, a captcha challenge may be requested, but the best way to bypass Cloudflare captcha is to avoid encountering it in the first place.
Methods Cloudflare is used to identify bots
Cloudflare utilises two main bot detection methods: passive and active.
- Passive detection methods like fingerprinting checks are conducted on the backend.
- On the other hand, active detection techniques involve checks performed on the client side.
Passive bot detection techniques
Cloudflare has created a database of devices, its Cloudflare IP ranges, and behaviours associated with malicious botnets. Any device considered a part of one of these networks will be blocked or presented with additional client-side challenges to solve.
IP addresses reputation
The reputation of user IP addresses (a risk score or fraud score) is determined by geolocation, ISP, and past reputation. For instance, IPs belonging to a data centre or VPN provider will typically have a poorer reputation than residential IPs. Also, sites may restrict access from regions outside their service area, as traffic from a genuine customer should never originate from there.
HTTP request headers
Cloudflare uses HTTP request headers to identify if a user is a robot. If a non-browser user agent, such as python-requests/2.22.0, is detected, it is easy to spot the bot. Cloudflare can also block the bot if the request headers are missing or mismatched. For example, if a Sec-CH-UA-Full-Version-List: header is included with a Firefox user agent.
Fingerprinting with transport layer security (TLS)
TLS fingerprinting is a technique that Cloudflare’s antibot uses to identify the client sending requests to a web server. This method of fingerprinting TLS is beneficial because it produces a static fingerprint per request client, which differs from the fingerprint of other release versions, browsers, or request-based libraries.
For instance, the TLS fingerprint of a Chrome browser on Windows (version 104) would differ from that of a Chrome browser on Windows (version 87), a Firefox browser, a Chrome browser on an Android device, and the Python HTTP requests library.
Cloudflare uses TLS fingerprints to analyse the fields in a ‘client hello’ message during the TLS Handshake and compares them to a database of pre-collected fingerprints. If the client’s hash matches an allowed fingerprint, Cloudflare compares the user-agent header to the user-agent associated with the request, which matches the stored fingerprint. If they fit, the request is permitted; if not, it is blocked as a sign of custom botting software.
Active bot detection techniques
Cloudflare uses web event listeners to detect user interactions, such as mouse movements, clicks, or key presses. Usually, a real user will be navigating the page with their mouse or keyboard. If Cloudflare notices a lack of mouse or keyboard activity, they can infer that the user is likely a bot.
In the past, CAPTCHAs were the go-to solution for detecting bots. However, it’s well-known that they can negatively impact the user experience. Whether or not Cloudflare serves a captcha to a user is based on a variety of factors, such as website configuration, risk level, and the type of browser being used.
For example, a user browsing a website with the Tor client is more likely to be served a CAPTCHA, while a user running a standard web browser such as Google Chrome may not be.
In these cases, bypassing Cloudflare protection is possible. Until 2020, Cloudflare relied on reCAPTCHA for their captcha service, but now, they have migrated to using hCaptcha exclusively.
Browser environment API
- Browser-specific APIs: If window. Chrome API doesn’t exist when Cloudflare checks a browser, despite the browser headers, TLS, and HTTP/2 fingerprints indicating it is Chrome; it is a sign of fake fingerprints.
- Automated browser APIs: Selenium’s automated browsers have an API (window.document.__selenium_unwrapped) that Cloudflare can detect, indicating the user is inaccurate.
- Sandbox browser emulator APIs: JSDOM, a sandboxed browser emulator running in NodeJs, has a process object exclusive to NodeJs.
- Environment APIs: Navigator. Platform value set to Linux x86_64 makes the request look suspicious if the user-agent says macOS or Windows machine.
How to bypass Cloudflare cache?
Scrape data from Google Cache instead of the website for up-to-date info; Google crawls most Cloudflare-protected websites to create a cache.
It is only feasible to scrap the Google Cache if the website data you want to scrape remains unchanged for a substantial amount of time. To squeeze the Google Cache, add to the start of the URL:
For example, if you would like to scrape https://www.petsathome.com/shop/en/pets/dog, then the URL to scrape the Google Cache version would be:
How to bypass the Cloudflare waiting room?
When you access a Cloudflare-protected website on your browser, you’ll be taken to a waiting room for a few seconds. It is to verify that you are a natural person, not a bot. You will be given an “Access Denied” error if identified as a bot. Otherwise, you will be automatically redirected to the actual web page.
What is the duration of Cloudflare’s waiting room bypass?
Once you enter the waiting room, the time you’ll be there varies depending on the target’s security level and how your scraper performs on the tests, potentially up to five or ten seconds. After completing the challenge, you can browse the site without re-entering the waiting room.
Three general approaches exist to solve the client-side anti-bot challenges that occur while you wait.
- Using a library like JSDOM, you can emulate a browser in a sandbox, which is less resource-intensive and gives you finer control over what you want it to render.
Why is Cloudflare blocking me from websites?
Cloudflare automatically blocks suspicious-looking traffic, especially from non-human sources, to protect against DDoS attacks. Human verification or CAPTCHA methods are usually used to distinguish between a machine and a natural person trying to access a website.
How can you fix Cloudflare blocking?
Create a new Cloudflare WAF rule to let the IPs bypass the firewall and avoid being blocked by automated processes. To do this, go to Security in the left side panel of the Cloudflare dashboard, click WAF, and create a new firewall rule.
How can you get unblocked by Cloudflare?
The website owner has blocked your IP, country or region, so Cloudflare support cannot override the customer’s security settings. To resolve this, you must contact the site owner so you can be unblocked!
How do you bypass Cloudflare WAF with the origin server IP addresses?
Cloudflare web application firewall (WAF) is now one of the most widely used. While it effectively blocks primary payloads, many ways exist to bypass Cloudflare WAF. New methods to mitigate attacks are being developed daily, so it is vital to test Cloudflare’s Security continually.
- Begin your normal recon process by collecting as many IP addresses as possible using host, nslookup, whois, and IP ranges.
- Later, use tools like Netcat, Nmap, and Masscan to determine which servers have enabled a web server.
- Once you have a list of web server IPs, utilise Burp to check if the protected domain is set up as a virtual host on one of them. If not, you will have the default server page or website. However, you have found your entry point if the protected domain is configured as a virtual host.
Find the origin IP address of the web server.
If you cannot find the origin IP using the burp suite, try using different approaches to get the origin IP address of a website’s server. Let’s explore them.
If your targeted website has SSL certificates (which is the case for most sites), they must be registered in the Censys database. If the website has shifted to the Cloudflare CDN, its SSL certificate would still be reported to the original web server name. You can use the Censys database to look up the website and see if any servers are hosting the original website.
Email a non-existent address at the target website, such as [email protected], to identify the IP address. If the email delivery is failed, you will receive a notification from the email server with the IP address. Here to note that this will only help you determine the IP address if the target is not using any 3rd party mail service provider.
To find out the IP address of the central server, you can use the Censys database or Shodan to search for DNS records of subdomains or A, CNAME, AAAA, and MX DNS records.
Sometimes, other services like subdomains, mail exchanger (MX) servers, FTP/SCP services, or hostnames are also hosted on a server similar to the main website. However, these services are yet to be protected by the Cloudflare network, so it is essential to ensure their Security.
After obtaining emails from the website you’re testing through subscribing to their newsletter, creating an account, using the “forgotten password” function, or ordering something, you need to review the source and mail headers of the emails. Record all IP addresses and subdomains you find, as they could lead to a hosting service.
Utilise Burp Collaborator to help you with this process. Afterwards, try accessing the target domain using all the IPs and subdomains you’ve discovered.
If you reach the web application directly through the server IP, you bypass all protections Cloudflare offers.
How to bypass Cloudflare by scraping with fortified headless browsers?
Rather than using a vanilla headless browser, which anti-bot systems may detect due to its JS fingerprints, developers have released fortified headless browsers designed to look like a real user’s browser and patch the most significant leaks. This provides an alternative option for completing the scraping job.
- Puppeteer currently has a stealth plugin available.
- The playwright will also have a stealth plugin shortly, and you can keep up with its development here and here.
- For Selenium users, an optimized Chromedriver patch called undetected Chromedriver is available.
Headless browsers such as Puppeteer, Playwright and Selenium are known to have a common security flaw: the value of the navigator. Web driver. This value is usually false in regular browsers but accurate in unsecured headless browsers.
Headless browser stealth plugins attempt to patch the over 200 known headless browser leaks and can often bypass anti-bot services such as Cloudflare, PerimeterX, Incapsula, and DataDome. However, some leaks remain, and to make a headless browser appear like a real browser, one must do this on their own.
To make headless browsers more undetectable, they can be paired with high-quality residential or mobile proxies, which have higher IP address reputation scores than data centre proxies and are less likely to be blocked.
Using residential and mobile proxies can become costly due to charges per GB of bandwidth used. Additionally, a headless browser can consume an average of 2MB per rendered page.
How to bypass Cloudflare using solvers?
If you cannot locate the origin server and cannot use the Google Cache, you must bypass Cloudflare directly. You can use one of the many Cloudflare solvers available to do this, like:
- cloud scraper
The best-performing Cloudflare solver at present is FlareSolverr.
FlareSolverr is a proxy server which enables you to send requests that bypass Cloudflare and DDoS-GUARD protection. It works by running a proxy server that forwards your requests to the Cloudflare-protected website using Puppeteer and the stealth plugin.
Installation is simple, as FlareSolverr can be installed on a server with Docker (Firefox browser already included). It then waits until the Cloudflare challenge is solved (or times out) before returning the response and cookies to your scraper. These cookies can bypass Cloudflare with HTTP clients such as Python Requests, HTTPX, Node Axios, etc.
How can you bypass Cloudflare by reverse engineering Cloudflare anti-bot protection?
Reversing the Cloudflare anti-bot protection system and creating a bypass that passes all Cloudflare’s anti-bot checks without the need for a fully fortified headless browser instance is a complex process. While this method may be effective, it is not for the faint of heart.
The benefit of this method is that if you need to scrape on a large scale and don’t want to launch hundreds (or even thousands) of expensive headless browser sessions, you can create the most resource-efficient Cloudflare bypass possible. This is tailored to bypass Cloudflare JS, TLS, and IP fingerprint tests.
This approach has drawbacks; you must delve into a complex anti-bot system that is purposely difficult to comprehend from the outside and experiment with different tactics to deceive the verification system. Additionally, you must keep up with this system as Cloudflare maintains and enhances its anti-bot protection.
How do you bypass Cloudflare using an intelligent proxy with Cloudflare built-in bypass?
Using private smart proxies instead of open-source Cloudflare Solvers and Pre-Fortified Headless Browsers is a better option, as anti-bot companies like Cloudflare can quickly patch the issues that open-source bypasses exploit.
This means that open-source Cloudflare bypasses have a limited shelf life of a few months before they become obsolete. On the other hand, intelligent proxies develop and maintain their own private Cloudflare bypass measures, which are more secure and longer lasting.
Competent proxy providers such as ScraperAPI, Scrapingbee, Oxylabs, and Smartproxy offer a range of Cloudflare bypass techniques that can be used with varying degrees of success.
Proxy companies developing Cloudflare bypass are typically more reliable, as they have a financial incentive to stay one step ahead of Cloudflare and fix any issues. These bypasses can vary in cost and effectiveness, so it is vital to research the best option for your needs.
Shahrukh, is a passionate cyber security analyst and researcher who loves to write technical blogs on different cyber security topics. He holds a Masters degree in Information Security, an OSCP and has a strong technical skillset in offensive security.