Audio Cloning

June 23, 2025 · 2 min read

#Python #OpenVoice #Melo TTS

This is an extension of the Video Translator tool that generates cloned audio for short subtitles, improving the quality of audio output. The tool uses the OpenVoice and Melo TTS libraries to clone the audio output generated by the video translator tool.

Challenges faced while integrating audio cloning:

Library Compatibility: The libraries were not functional on Python 3.12, which I was using. They worked on Python 3.10, so including the TTS logic in the same script was not possible.
Short Subtitles: The libraries needed a longer sample audio to clone the voice better. Some of the subtitles were short, leading to errors in generating the audio for those subtitles. I cloned the OpenVoice repository and updated the checks that required longer audio samples. I also added a check to skip the subtitles that were too short.

Fixes:

The openvoice/se_extractor.py had a check that to avoid saving audio segments which are shorter than 1.5 seconds. Some of the words like hello, hi, bye were shorter than that. I updated the check to allow audio segments which are shorter than 0.5 seconds as well.
Since the openvoice library had explicitly mentioned in the documentation to install melo-tts, I included it in the requirements.txt file.
Added a script where I can configure the path or video-translator tool output folder and run the audio cloning logic on all the audio files generated by the video translator tool.

The fork is available at OpenVoice in case you want to use it in your own projects. The test1.py file can be used to clone the audio output generated by the video translator tool.