Audio Cloning
ยท 2 min read
This is an extension of the Video Translator tool that generates cloned audio for short subtitles, improving the quality of audio output. The tool uses the OpenVoice and Melo TTS libraries to clone the audio output generated by the video translator tool.
Challenges faced while integrating audio cloning:
- Library Compatibility: The libraries were not functional on Python 3.12, which I was using. They worked on Python 3.10, so including the TTS logic in the same script was not possible.
- Short Subtitles: The libraries needed a longer sample audio to clone the voice better. Some of the subtitles were short, leading to errors in generating the audio for those subtitles. I cloned the OpenVoice repository and updated the checks that required longer audio samples. I also added a check to skip the subtitles that were too short.
Fixes:
- The
openvoice/se_extractor.pyhad a check that to avoid saving audio segments which are shorter than 1.5 seconds. Some of the words likehello,hi,byewere shorter than that. I updated the check to allow audio segments which are shorter than 0.5 seconds as well. - Since the
openvoicelibrary had explicitly mentioned in the documentation to install melo-tts, I included it in the requirements.txt file. - Added a script where I can configure the path or
video-translatortool output folder and run the audio cloning logic on all the audio files generated by the video translator tool.
The fork is available at OpenVoice in case you want to use it in your own projects. The test1.py file can be used to clone the audio output generated by the video translator tool.