Use AI bullshit to remove podcast ads

convexer · September 29, 2025, 9:07pm

A one shot CLI tool to remove ads from a podcast: rm-podcast-ads podcast.mp3. Uses ollama and ffmpeg to do all the work.

It doesn’t work flawlessly but there’s a lot of low-hanging fruit for improvement:

someone who is good at the economy please help me prompt engineer this. my family is dying
All the random parameters like threshold and window_size need tuning
Or maybe you have an idea for a fundamentally better approach?

_{Sorry I’ve been so absent from discourse lately. Been busing coding and coping.}

CaffeineAndLasers · September 29, 2025, 10:39pm

Very nice. I’ll have to give it a try later. One bit of “prompt engineering” I noticed. You may be able to get faster and more reliable quality by constraining the output a bit more.

If you ask it to return ‘true’ or ‘false’ only, you can set the temperature to 0 (which makes if behave more deterministically) and cut off the generation as soon as you get the first token, since you only need a “T” or a “F” to infer the rest of what it is about to generate.

My suggestion is to tell it to output in yaml with fields for “explanation" and “score”. The explanation can kinda act like the thinking step, make sure thos is before the score so it thinks before it answers.

convexer · September 29, 2025, 11:04pm

Oof very nice idea. Indeed the reason I was asking for explanation was as a bit of forced thought but then I wanted the output to be easy to parse so I told it to give the answer first. Might do json instead of yml which will look a bit better in the debug logs.

convexer · September 29, 2025, 11:06pm

Hm, maybe I can just ask the llm directly for a score between 0 and 1 to allow it to express different degrees of confidence. Before, I had a “partial” response option but it just answered with that every time after coming up with decent arguments for both sides, lmao.

CaffeineAndLasers · September 30, 2025, 1:40am

I have my own project trying to use LLMs to assign numeric scores. I’ve found it biasses the number 7 (out of 10) or 0.7 a lot, I think this is a reflection of everyone rating every movie a 7/10.

Maybe a set of discrete categories would work however, Ad, Not Ad, Sponsor Read, Unsure,

convexer · October 1, 2025, 7:50pm

OK I adjusted it to ask the LLM first to summarize the main content and ads separately and use this as part of the context. Computer is slow af though so haven’t tested it much.