Request Amplification in Mastodon

A good thing to keep in mind if you’re posting links to your site on Mastodon!

I recently started a Mastodon account and posted a link to a blog post of mine. When I logged into my Cloudflare account this morning to make some changes to my site, I was concerned to see a massive upsurge in traffic and bandwidth use. Security logs showed that all of the traffic was coming from Mastodon user agents. None of these were legitimate users, just bots. For a second, I thought I’d accidentally ruffled someone’s feathers and was under some sort of DDoS attack, but found the above links after some searching. I set up some custom wildcard security rules to block all queries made from Mastodon user agents, and blocked almost a thousand requests in less than 30 minutes after creating the rule.

One thing the above articles do not mention is that even posting your website’s link in your Mastodon profile generates numerous requests to your website from Mastodon every single second. I tested this by changing the link in my profile to my blog directory for approximately thirty seconds, and saw over 50 mitigated hits to that link in my security logs during that time. Removed the link, and the onslaught stopped immediately.

I dunno why the heck Mastodon works like this, but I certainly wasn’t expecting it. It wouldn’t have really affected me, as I don’t pay for my hosting through Cloudflare and my site is small enough that it wouldn’t be an issue, but it’s definitely something you should be aware of if you aren’t already!

3 Likes

That’s why I have the following in .htaccess:

# Fediverse instances asking for previews?
# Computer says, ‘fuck you.’
RewriteCond %{HTTP_USER_AGENT} Mastodon|Friendica|Pleroma|Akkoma|Misskey [nocase]
RewriteRule .* . [L,G]
ErrorDocument 403 "non serviam"
1 Like

Hahaha, I love it. Also, thanks for giving me more user agents to block!

1 Like

You want stuff to block? I can hook you up.

# 410 gone for a long string of lowercase letters
# and/or numbers followed by an optional long extension
# to handle spam URLs like /spstyaaliti4csf6ne.desiringly
RewriteRule ^/?[a-z0-9]{12,30}(\.[a-z0-9]{8,30})?$ - [G,L]

# Fediverse instances asking for previews?
# Computer says, ‘fuck you.’
RewriteCond %{HTTP_USER_AGENT} Mastodon|Friendica|Pleroma|Akkoma|Misskey [nocase]
RewriteRule .* . [L,G]
ErrorDocument 403 "non serviam"

# Got referred here by Hacker News
# or a shitty search engine, social platform, or forum?
# Computer says, ‘fuck you.’
RewriteCond %{HTTP_REFERER} news.ycombinator.com [NC,OR]
RewriteCond %{HTTP_REFERER} facebook.com [NC,OR]
RewriteCond %{HTTP_REFERER} threads.net [NC,OR]
RewriteCond %{HTTP_REFERER} instagram.com [NC,OR]
RewriteCond %{HTTP_REFERER} bsky.app [NC,OR]
RewriteCond %{HTTP_REFERER} x.com [NC,OR]
RewriteCond %{HTTP_REFERER} reddit.com [NC]
RewriteRule .* . [redirect=402,last]
ErrorDocument 402 "Fuck you. Pay me."

RewriteCond %{HTTP_REFERER} 4chan.org [NC,OR]
RewriteCond %{HTTP_REFERER} forum.agoraroad.com [NC,OR]
RewriteCond %{HTTP_REFERER} twitter.com [NC]
RewriteRule .* . [redirect=403,last]
ErrorDocument 403 "Fuck off, Nazis."

RewriteCond %{HTTP_REFERER} kiwifarms.net [NC]
RewriteRule .* . [redirect=403,last]
ErrorDocument 403 "Chris Chan is Joshua Moon."

# This is a static site, assholes. Stop trying to look for shit to exploit.
RewriteRule \.php$ . [redirect=410,last]
RewriteRule \.aspx$ . [redirect=410,last]
RewriteRule \.asp$ . [redirect=410,last]
RewriteRule \.jsp$ . [redirect=410,last]
ErrorDocument 410 "non serviam"

# Filched from Alex Schroeder
# source: https://alexschroeder.ch/view/2025-03-21-defence-summary
RewriteCond "%{HTTP_USER_AGENT}" "!archivebot|^gwene|wibybot" [nocase]
RewriteCond "%{HTTP_USER_AGENT}" "bot|crawler|spider|ggpht|gpt" [nocase]
RewriteRule .* . [redirect=410,last]
ErrorDocument 410 "non serviam"

# Deny the image scraper
# https://imho.alex-kunz.com/2024/02/25/block-this-shit/
RewriteCond "%{HTTP_USER_AGENT}" "Firefox/72.0" [nocase]
RewriteRule .* . [redirect=410,last]
ErrorDocument 410 "non serviam"

# Google, SEO bot, or AI bot?
# Computer says, ‘fuck you.’
RewriteCond %{HTTP_USER_AGENT} (Headless) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (360Spider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AdsBot-Google) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AdsBot-Google-Mobile) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AhrefsBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AhrefsSiteAudit) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Amazonbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Applebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Applebot-Extended) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AwarioRssBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (AwarioSmartBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Baiduspider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (BingPreview) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Bytespider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (CCBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ChatGPT) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ChatGPT-User) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Claude-Web) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ClaudeBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DataForSeoBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Diffbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (DiscordBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Dotbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FacebookBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (FacebookExternalHit) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Feedfetcher-Google) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (GPTBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Google-Extended) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Google-Safety) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (GoogleOther) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Googlebot-Image) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Googlebot-Mobile) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Googlebot-News) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Googlebot-Video) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (HaoSouSpider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (ImagesiftBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (LinkedInBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (MJ12bot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Omgilibot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (PerplexityBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Pinterestbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Rogerbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SemrushBot-BA) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SemrushBot-COUB) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SemrushBot-CT) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SemrushBot-SI) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SemrushBot-SWA) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Semrushbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SiteAuditBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (SplitSignalBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (TelegramBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Turnitin) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Twitterbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (Yandex) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YandexBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YandexImages) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YandexRenderResourcesBot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (YouBot) [NC]
RewriteCond %{HTTP_USER_AGENT} (adbeat_bot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (beskuttlebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (bingbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (cohere-ai) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (magpie-crawler) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (msnbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (msnbot-media) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (omgili) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (peer39_crawler) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (redditbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (skutlbot) [NC]
RewriteRule .* . [redirect=402,last]
ErrorDocument 402 "Fuck you. Pay me."

4 Likes

If a service like Discord fetches a page to generate a link preview, it can cache it for every Discord “server.”

Mastodon instances are independent so they don’t usually share a cache.

I think they tried to address the problem in part by adding a random delay in newer versions of Mastodon.

1 Like

Different environment, same concept

if ($http_user_agent ~* (mastodon|AhrefsBot|SemrushBot)) {
    return 418;
}

Decided to return a 418 rather than a 403 simply because I’m an idiot.

1 Like

At least 418 is implemented. 402 isn’t, but I still use it with a quote from GoodFellas.

1 Like

Ah didn’t notice you have the 402 in the second post, only saw the 403 in the first one. That’s nice ahah well done.

On at least one instance I’m on, the server doesn’t even fetch a site preview until I interact with the post in question. There are definitely mitigation issues in place. Dunno if each instance caches site previews for its own members’ benefit, but it would be the sensible thing to do. It still leaves the issue of many different instances hitting the original page once each. That would explain why sometimes I see spikes of hundreds or even thousands of hits in my otherwise sedate traffic stats. Good thing my sites are static and most pages are tiny.

This totally explains why my host is saying that i’m running out of resources, and the fact that my website was down most of the weekend.

So when i’m getting home, i’m going to do what i kinda dont want, and that is blocking all the mastodon, pleroma, *key instances.

And also i’m going to benchmark my raspberry pi by posting the link to my website thats hosted on said pi.

Let see if this will help.

I was just going to respond to you on Bluesky, then went “hang on a second, I’ve seen that name before.” :slight_smile: Hopefully you won’t have any more issues with this going forward.

This really is a rather unfortunate thing … it makes me not want to share links to external sites on Mastodon, because I don’t want to accidentally tank someone’s website.