If you're posting links to Mastodon and use hashtags to increase reach, beware of the AI scrapers

I linked to one of my recent blog posts on Mastodon this past week, and was dismayed to discover that a new “follower” (I put that word in scare quotes for a reason) scraped the contents of my post a few seconds after it went up and uploaded a crappy multi-page AI-generated summary of the post to their website (a digital marketing / web dev company) without my permission and without providing ANY credit to me.

The account itself is a bot account. In the past twenty-four hours alone, it has scraped and uploaded PDF summaries of 31 separate links posted by authors all over Mastodon. I suspect it is listening to certain hashtags; I used #blog #blogging #indieweb #yaml #jekyll #webdev on the post that got scraped.

I really don’t care when bots automatically boost hashtagged posts (that’s actually helpful!), but I draw the freaking line at bots scraping and reposting my work on another website without credit. I suppose it’s on me for not realizing that there were probably bots out there in the Fediverse pulling this kind of shit, but anyway… just wanted to share this in case other people here feel strongly about not having their work gobbled up and regurgitated by an LLM. :roll_eyes:

6 Likes

Fediverse sites hosting such bots should be blackholed. No mercy.

4 Likes

Agreed. And because I have no qualms about naming and shaming when it comes to this sort of thing, here’s the bot in question:

What’s hysterical is that their shitty AI completely misinterpreted my post at times, and included bullet points in the PDF summary that actually contradict what I wrote.

I wouldn’t have known anything about this if the bot hadn’t auto-followed me when it scraped the contents of my link. I expect there are a lot more of these artificial hemorrhoids all over Mastodon, but here’s at least one to block.

3 Likes

I guess Mastodon doesn’t have an equivalent to “logged-in users only,” but I’m not sure it would help much anyway.

I found out today about the following project called “iocaine“

I’m tracking who visits my site through umami and yesterday the same scraper I’ve been arguing with for weeks went totally crazy and found out about my wiki.

It’s been almost 24 hours and I haven’t seen anything from the scraper lately.

I do recommend that you read the documentation thoroughly, because if you have the wrong settings, it can even block search engine crawlers. I don’t really care about that, so I turned it off.

1 Like

if you haven’t already, I recommend posting about it to #fediblock so moderators can find out about it asap.

2 Likes

Not sure if this diverges too much from the original topic — but things like these scare the hell out of me. How are we supposed to even be in the Internet if everything we do can be stolen and replaced?