I’ve recently done some talks for my schools cybersecurity club, and now I want to edit them.
My actual video editing needs are very simple, I just need to clip parts of the video out, which basically every editor can do, as per my understanding.
However, my videos were recorded from my phone, and I don’t have a presentation mic or anything of the sort, meaning background noise, including people talking has slipped in. From my understanding, it’s trivial to filter out general noise from audio, as human voices have a specific frequency, even “live”, like during recording or during a game, but filtering voices is harder.
However, it seems that AI can do this:
https://scribe.rip/axinc-ai/voicefilter-targeted-voice-separation-model-6fe6f85309ea
Although, it seems to only work on .wav audio files, meaning I would need to separate out the audio track first, convert it to wav, and then re merge it back in.
Before I go learning how to do this, I’m wondering if there is already an existing FOSS video editor, or plugin to an editor that lets me filter the video itself, or a similar software that works on the audio of videos.
The technique you linked is, even at a couple years old, a pretty cutting-edge technique. You aren’t going to find it or something similar in any video editing software. Maybe someone’s made a plugin for one if you are lucky.
However, there are a lot of free tools that make it easy to split or rejoin audio and video, and convert it between different formats.
Id recommend:
- Audacity if you want a GUI
- FFMPEG if you want a command line tool
- VLC can also do a lot of conversions FFMPEG does if you dig through its features (it’s basically a GUI wrapper for FFMPEG)
Any AI solution you find is probably going to be command line / python and is going to require some debugging of your python environment and dependencies to get it working. And that means yes, you will need to separate the audio and video tracks and then recombine them. For that kind of work, I’m only familiar with Linux tools. I’ve used a tool called Vidcutter that is buggy, but powerful and has a semi intuitive gui.
That said, the results from those AI tools can be a powerful game changer if you can figure them out.
It’s possible that there’s a reason it requires lossless audio, in that it requires uncompressed signal to work. For instance, if the ML model is trained on uncompressed data, it may need audio which has never been compressed.