Blocking Claude (for now)

A lotta folks are often discussing strategies from blocking their site from LLM traffic, looks like a near technique just dropped

My DIsclaimer:
From my understanding, a lot of the AI related traffic are scraping for training data rather than from AI inference tool calling. I highly doubt this will do anything to stop bot crawlers, but it looks like it will block Claude from seeing your content at runtime

3 Likes

I wonder if we can stick that magic <code> inside a <noscript> so that it isn’t visible to most visitors.

I’m not sure. It ignoring it in headers makes sense since I assume that they strip out anything that they wouldn’t expect a human to read in order to save context in memory. But that doesn’t explain why the <code> block is necessary to trigger it though

I had a quick play around with OpenAIs tokenizer tool and it looks like if you put the string in the middle of free form text that the first token of the string can get mutated by some whitespace character before it, but NOT with a > before it.

Proof

No idea if this holds for Claudes tokenizer, their tools are more annoying to use so I havent tested it.

I’d rather just block their user agents at the server level; why let them read a page just to throw it away if we can throw HTTP 402 Fuck You, Pay Me at them instead?

Yeah that works too :P

1 Like

User agents can be whatever they choose them to be, it’s only a matter of time before the agents decide to send different headers to see results.

Exactly. Perplexity has already been accused by Cloudflare of using “stealth” user agents to pose as people. Now, that’s coming from Cloudflare, who have reason to market their AI honeypot and other AI bot blocking features, so take it with a grain of salt… but I don’t doubt for one second that unethical scrapers out there are using different user agents to bypass blocks.

Thanks @CaffeineAndLasers for the tip re: Claude. I see that the author of that post has cleverly disguised the code block on their site. I won’t say how, because Anthropic is probably scraping this forum post as well. :upside_down_face: But everyone here can easily figure it out.