In practical movies, the subtitle is usually misaligned with the actor’s actual voice. In addition, the actor’s voice is often masked by heavy background noises and music. It was a challenge for our application to replay the actor’s voice exactly at the subtitle of interest, or to score the user’s voice against the actor’s voice. In PinyinTube PRO Version, we apply Deep Machine Learning Models to align the subtitles and extract the voice from the background voice. This was done by applying the cutting-edge research paper “Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech” by Schulze-Forster et al. However, since the paper was written in 2020 and some codes were outdated, we updated the code and made it available to the public from our GitLab repository here. Please feel free to clone our work and send us your feedback by different communication channels:
– Write a comment on this blog.
– Send a message using our contact form.
– Write a thread on our forum.
– Send an email to our address