Cloudflare turns AI against itself with endless maze of irrelevant facts

On Wednesday, web infrastructure provider Cloudflare announced a new feature called “AI Labyrinth” that aims to combat unauthorized AI data scraping by serving fake AI-generated content to bots. The tool will attempt to thwart AI companies that crawl websites without permission to collect training data for large language models that power AI assistants like ChatGPT.

Cloudflare, founded in 2009, is probably best known as a company that providesinfrastructure and security services for websites, particularly protection against distributed denial-of-service(DDoS) attacks and other malicious traffic.

Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” writes Cloudflare. “But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.”

The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven). Cloudflare creates this content using its Workers AI service, a commercial platform that runs AI tasks.

Cloudflare designed the trap pages and links to remain invisible and inaccessible to regular visitors, so people browsing the web don’t run into them by accident.

Source: Cloudflare turns AI against itself with endless maze of irrelevant facts - Ars Technica

https://blog.cloudflare.com/ai-labyrinth/

8 Likes

If it does not harm the Web then I’m all for it.

AI scrapers have been able to steal without fear of consequences or costs. It’s time we hit back.

2 Likes

Given that cloudflare already “needs” to have humans verify themselves as not being bots, I’m worried that they’ll have false positives and send actual humans into the AI generated maze instead of where they wanted to go.

4 Likes

Seeing big companies supposedly try to fight against AI for the sake of the consumer always makes me squint a little in suspicion. Especially when said companies are partnered with Google and Microsoft. Maybe I’m just jaded, by my first thought is, “Okay, that’s what they’re saying… now, what’s the real, profit-driven motive behind this?” If I was to wager a guess, I’d say this may be a marketing tactic to establish Workers AI as “one of the good ones”.

However, if it does as advertised, I do think this is a good alternative method to stop the bots, regardless of it being a marketing tactic. I’m a big fan of how it wastes the crawlers’ resources.

Yeah I’m skeptical for the same reasons you are. My guess is they plan to sell scraping rights or some other workaround to the AI companies.

1 Like