Category: Tech

Apply Quantum Clustering to preprocess movies subtitles

Introduction

PinyinTube is a Chrome extension that allows users to enjoy their favorite movies while learning languages immersively at the same time. In addition to displaying dual subtitles, the app also allows users to pause the video and replay the conversation sentence by sentence to practice speaking alongside the actors.

During the development of this application, we encountered difficulties with the alignment of subtitles, or with multiple actors speaking at the same time. In addition, there are also unimportant subtitles that describes noise or actions, which we will henceforth refer to as the “background subtitles”. Having noticed this issue, we took several measures to rectify it. In order to realign the subtitles, we can implement a highly efficient phoneme alignment model that utilized a two-layer LSTM-RNN architecture [1]. However, before that, we have to pre-process the subtitles to remove background subtitles on the fly while the user is watching the movie. To overcome this, we employed the use of clustering methods that were capable of grouping similar subtitles in real-time. The clustering method is highly effective in helping to eliminate outliers and accurately label subtitles that are in the same cluster. Through this methodology, we were able to significantly improve the quality of subtitle alignment and enhance the overall user experience of our application.

Quantum Clustering

Clustering algorithms aim to partition a dataset into groups or clusters, where data points within the same cluster are more similar to each other than to those in other clusters. Quantum clustering algorithms are a type of algorithm that leverage the principles of quantum mechanics to perform clustering tasks. The idea behind these algorithms is to exploit quantum phenomena, such as superposition and entanglement, to enhance the efficiency and effectiveness of clustering.

 

Quantum clustering algorithms are an active area of research within the field of quantum computing and have the potential to offer advantages over classical clustering algorithms in terms of computational efficiency and accuracy, particularly for large and complex datasets. Since Quantum hardware is currently still limited by processing power and requires improvement in error correction techniques, in this project, we will focus on quantum-inspired clustering algorithms that can run on the classical computers.

The most popular conventional clustering model is often referred to as the Parzen window estimator, where every data point is associated with a Gaussian kernel to approximate the probability density function. There is only one single parameter: the width (sigma) of the Gaussian function. In contrast, in quantum clustering, every data point is associated with a vector in the Hilbert space. By applying the Schrodinger equation, we can solve for the potential function. It has been shown that the quantum potential function can show the underlying structure of the data, where the minima indicate the centers of the clusters [2].

In the experiment, the data sets used are either text documents or oral conversation sentences. To extract the feature vector for each data point, the X^2 score is first calculated, then Principal Component Analysis (PCA) is used to reduce the dimension of the feature vector to only two. The F1 score is used to combine the precision and recall metrics. To estimate the parameter sigma, instead of using the popular statistical approach of the k-nearest neighbors (KNN) method, an easy method called Pattern Search is deployed. From the experiments, quantum clustering shows a higher F1 score than the traditional clustering method in identifying the topics of the different text data points. Additionally, the model can be applied to identify clusters of different writers of literature documents [3].

From the paper [3], we are confident that quantum clustering can be used to cluster movie subtitles into different types and actor source. After clustering, we can eliminate the clusters with fewer members, which are probably the “background subtitles,” and keep only a few large clusters. The remaining clusters can represent subtitles from the few different main characters. The labels from clustering can be carried forward to the next AI model that extracts actor’s voice from the background.

References

[1] Schulza-Forster et al, “JOINT PHONEME ALIGNMENT AND TEXT-INFORMED SPEECH SEPARATION ON HIGHLY CORRUPTED SPEECH“, Conference Proceeding at ICAASSP 2020. 

[2]  D. Horn, A. Gottlieb, “Algorithm for data clustering in pattern recognition
problems based on quantum mechanics”, Physical Review Letters, 2002.

[3] Ding Liu et al, “Analyzing documents with Quantum Clustering: A novel pattern

recognition algorithm based on quantum mechanics”, Pattern recognition Letter, 2016.

Learn More

The AI models that power PinyinTube’s voice and subtitle extraction.

Introduction

PinyinTube is an exceptional tool that offers an unparalleled language learning experience through movies. This innovative Chrome extension promises to make language learning an unforgettable experience by providing an immersive learning experience. The dual subtitles feature on this language learning platform ensures that all levels of learners can grasp the content easily. Moreover, the Romanized Chinese Pronunciation makes it really easy for users to learn how to speak Chinese words like a native. The interactive design on this platform allows users to pause and replay the content as much as they want, making it a perfect tool for practice sessions with actors. If you’re passionate about taking your language learning up a notch, be sure to upgrade to the PRO version. The PRO version makes it possible to record your voice, compare your tone and pronunciation to that of the native actors, and track your progress. Learning a new language has never been this fun!

misaligned audio screenshot of app

While creating this extension, we have had to overcome multiple hurdles that seemed daunting initially. However, we are planning to surmount them thanks to our technical expertise and the remarkable capabilities of AI. One of the most significant issues we encountered was that the subtitles were often misaligned from the actual audio that was being played, making it exceedingly tough to replay separate sentences precisely. Additionally, we noticed that the actor’s voice was often lost in the background noise and music, coupled with multiple actors speaking simultaneously or mumbling, which made it even trickier to extract their voice and match it with the user’s recorded voice. These challenges could have potentially hindered our ability to deliver the best possible output; however, our team was undeterred and instead chose to deploy a series of cutting-edge AI models, developed in a carefully drafted sequential pattern:

AI roadmap

– First, the technique of Quantum Clustering is used to group together different types of speech in the subtitles, such as the dialogue of different characters, background noise, and general description. This clustering process allows for the filter to be applied to only the speech of the main characters. In particular, we will apply the method introduced by Ding Liu in 2016 [1].

– Secondly, the voices of the main characters are aligned with the corresponding subtitles through a phoneme method developed by Schulze-Forster in 2020 [2].

– Using the labelled clusters and subtitles, the voices of the main actors can be separated from the background noise by applying the text-informed sound separation method developed by Kevin Kilgour and others from Google Research in 2022 [3].

– However, this may often result in a corrupted and unclear audio. To enhance the audio, generative AI techniques are suggested, which were developed by Pascual in 2017 [4].

Due to the technical nature of the above topics, we will write separate blog posts to discuss the in-depth technical details. Please follow the hyperlink on each topic to go to the corresponding pages.

References

[1] Ding Liu et al, “Analyzing documents with Quantum Clustering: A novel pattern

recognition algorithm based on quantum mechanics”, 2016, Pattern Recognition Letters

[2] Kilian Schulze-Forster et al, “JOINT PHONEME ALIGNMENT AND TEXT-INFORMED SPEECH SEPARATION ON HIGHLY CORRUPTED SPEECH”, 2020, conference proceeding at ICASSP 2020

[3] Kevin Kilgour et al, “Text-driven separation of arbitrary sounds”, 2022, Conference proceeding at Interspeech 2022

[4] Santiago Pascual et al, “SEGAN: Speech Enhancement Generative Adversarial Network”, Conference proceeding at Interspeech 2017

Learn More

Application of Large Language Models (LLM) to subtitle alignment and actor’s voice isolation

PRICE SLASHED! New Year Promotion (15 Jan -15 Feb)10$   5$ for Premium Version
DOWNLOAD FREE VERSION NOW

In practical movies, the subtitle is usually misaligned with the actor’s actual voice. In addition, the actor’s voice is often masked by heavy background noises and music. It was a challenge for our application to replay the actor’s voice exactly at the subtitle of interest, or to score the user’s voice against the actor’s voice. In PinyinTube PRO Version, we apply Deep Machine Learning Models to align the subtitles and extract the voice from the background voice. This was done by applying the cutting-edge research paper “Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech” by Schulze-Forster et al. However, since the paper was written in 2020 and some codes were outdated, we updated the code and made it available to the public from our GitLab repository here. Please feel free to clone our work and send us your feedback by different communication channels:
– Write a comment on this blog.
– Send a message using our contact form.
– Write a thread on our forum.
– Send an email to our address admin@swapbrain.com

 

 

 

Learn More

PinyinTube: Beyond a traditional translator

In the previous blog, we introduced PinyinTube, its purpose, and its mechanism to help SwapBrain users enjoy not only the best Chinese movies and videos experience, but also act as a useful Mandarin learning tool. In this post, we will go deeper inside the technology of PinyinTube and each role of PinyinTube’s task set.

1. A good listener

First, let’s dive into how PinyinTube records the actors’ voices and other audio snippets. Unlike Google Live Caption, which requires an inherent transcript set for each video, PinyinTube can access the microphone of each user’s computer and do the recording from microphone input. When the users press the microphone button, their voice will be recorded and stored in the software storage. This way, the users can playback their voice and compare with the actor’s voice. This can done multiple times, allowing the users to improve their speaking skills. PinyinTube’s live caption is built based on a JavaScript API called Web Speech API. Once installed into the JavaScript code snippet of the extension, this API will enable PinyinTube to collect the live audio the users.

pinyintube record button
Record your voice and replay to compare with the actor's voice.

2. A good translator

Now we have partially understood how PinyinTube retrieves the audio snippets and convert them to letter form. Yet, how can those words be converted to other languages like English to Pinyin or Mandarin to English? Interestingly, this all starts with how human beings pick up a new language.

Learning a new language demands time and effort. The first few months will prepare learners so that they can be involved in primary or intermediate conversations. With further learning and relentless practice, after a year or more, fluent daily communication with native people should not be a problem. This incredible progress is thanks to our neural system’s training and learning development. As we maintain frequent language exposure, our brain begins to “adapt” to new words, expressions, and phonetics. This way, hence, has been adopted to design one of the highest technological platforms, the Deep Neural Network.

Deep Neural Network is a subset of the Machine Learning field inspired by the human brain’s biological operation via trillions of neuron cells. Thousands of companies have recently applied Machine Learning to their tech product and SwapBrain is no exception. As an AI/Automation consulting company, we focus not only on AI services but also on Deep Neural Network application development. PinyinTube will be our firstborn using Machine Learning technology to be a great translating tool. To do that, the extension will process the caption input, push it through multiple layers consisting of cells in the Neural Network, and calculate the gradient before printing the output to the screen. This is the whole process for PinyinTube to translate Chinese/English subtitles into the user’s preferred language.

Deep Neural Network is the future of SwapBrain

3. A good supporter

With the capability of a live translator, PinyinTube also has some other notable features that SwapBrain customers, can take advantage of at no cost. Once installed and activated, the extension will automatically translate Chinese or English audio into your desired language. The caption will be provided right above the video’s subtitle, along with multiple interactive popup buttons that serve your needs. As you may not know, PinyinTube is uniquely built to link each audio sentence with its live caption. Thus, you can both hear a sentence spoken by the actor and read the translation script simultaneously. Additionally, you can click the forward/backward buttons to see the next/previous sentence. PinyinTube will receive the order to roll over on any sentence you choose and also jump to the corresponding fragment. For users who wish to utilize the extension as a learning tool, the playback speed button is adjustable so as to match your reading and listening level.

PinyinTube will also be a good language mentor. Isn't that awesome?

However, more advanced items can only be accessed through a pro subscription such as  “Anywhere Captions” and “Voice Comparison”. In particular, there is an extra charge for the extension to be used on other streaming websites (except Youtube and Netflix). With the latter, PinyinTube can also record your own audio to get tested and corrected against the actor’s voice. This will help you compare your phonetics with the standard actor’s voice in the video so that you can improve your language skills. Although PinyinTube is accessible to everyone, we believe spending a little extra would be largely beneficial for specific customers who are interested in learning Chinese or English to enrich your multilingual base.

4. What in your thought?

Now you have comprehended a set of stunning works we have been doing to bring you, our precious customers, the best version of a live translator and learning assistant. In the next few months, we are going to publish PinyinTube to the Chrome Web Store and serve you the first product that we have been doing our best on it. If you want to experience our MVP, please enter your email at the bottom of this page. If you have any feedback or suggestion abourt features that the extension is having, please let us know by commenting down below or send us a message via our contact page. Any comment will be a great contribution to the improvement and growth of SwapBrain and PinyinTube.

Let’s keep blogging and cheers!

Learn More

PinyinTube: A promising rise of a translator

1. From daily hobby to a start-up project

SwapBrain’s CEO and founder, Ms. Hung Do, developed an interest in learning natural languages. Apart from her mother tongue, Vietnamese, she can speak English, French, and German considerably well. Plus, she picked up a basic accent of Chinese during her time in Singapore, which is now her favorite foreign language.

Throughout the years, Ms. Do has developed her most vital skill: Machine Learning System Design. She has built various real-time Machine Learning applications that were used widely in many industries. The learning and working experience gave her insight into the future of Machine Learning in the world, which she combined with her hobby of learning Chinese. It dawned on her the idea of building a startup that enables her to make her way in the internal development of apps, which can be extremely helpful shortly when AI dominates the market. When the idea came to reality, PinyinTube is the firstborn product of the company.

2. Streaming experience refresher

PinyinTube is a promising Chrome Extension on Google Chrome that can translate Chinese subtitles into other languages for international viewers who have limited English reading skills and vice versa. For those who can speak Chinese but only in Pinyin format, PinyinTube, as the name is, also runs the Pinyin caption right below the English subtitle. Hence, the application targets specifically both Chinese and English-speaking communities.

Youtube_chinese_french
Pinyin can link Mandarin a character with its pronouncitation and tone

According to CEO Do, nowadays every single video on any streaming platform has English subtitles, regardless of their origins. This is, indeed, because English is the most popular language all around the world. Yet this hinders non-English speakers to enjoy the streaming time. Chinese people are heavily affected by this drawback since only 1% of them speak English. Strangely, their native language which remains the second most popular globally, Mandarin, receives much less interest to receive dedicated subtitling applications. It is surprisingly uncommon to see Mandarin appear on the subtitle option list, not to mention Pinyin. This is inexplicable as it does not require much effort to include Pinyin in the subtitle list when it is a friendlier romanization version of Mandarin to non-native speakers. Therefore, PinyinTube is created to solve the problem. 

With PinyinTube, a movie lover yourself can widen your video watching options to a new range. Once installed in your web browser, it will bring you a whole new experience when you still can understand Chinese videos with both English and Pinyin captions at the bottom,  without having to pause and translate Chinese captions manually. Even for English-based videos, Pinyin still appears on your screen to help you learn English for easier catching-up too.

3. What in your thought?

PinyinTube is a project that we, SwapBrain members, have been investing time and resources on to hopefully break down the boundary between Chinese and English learners. Moreover, we are expanding the project’s covered language beyond only Pinyin. Soon, we will add more languages such as French and German to the options list so users can choose any of their preferred ones for the best streaming experience. When that dream comes true, we may get PinyinTube a new name since its call will not fit anymore ^^.

As a startup, your feedbacks are extremely important for us to modify our project to fit your needs. Do you have any suggestions on what this extension can do? Please leave a comment below and we will be right here to listen to you. One more time, please visit the landing page for more information and waiting for our debut.

Cheers and let’s keep blogging!

Learn More

Verified by MonsterInsights