Tech

Apply Quantum Clustering to preprocess movies subtitles

May 24, 2023 SwapBrain

Introduction

PinyinTube is a Chrome extension that allows users to enjoy their favorite movies while learning languages immersively at the same time. In addition to displaying dual subtitles, the app also allows users to pause the video and replay the conversation sentence by sentence to practice speaking alongside the actors.

During the development of this application, we encountered difficulties with the alignment of subtitles, or with multiple actors speaking at the same time. In addition, there are also unimportant subtitles that describes noise or actions, which we will henceforth refer to as the “background subtitles”. Having noticed this issue, we took several measures to rectify it. In order to realign the subtitles, we can implement a highly efficient phoneme alignment model that utilized a two-layer LSTM-RNN architecture [1]. However, before that, we have to pre-process the subtitles to remove background subtitles on the fly while the user is watching the movie. To overcome this, we employed the use of clustering methods that were capable of grouping similar subtitles in real-time. The clustering method is highly effective in helping to eliminate outliers and accurately label subtitles that are in the same cluster. Through this methodology, we were able to significantly improve the quality of subtitle alignment and enhance the overall user experience of our application.

Quantum Clustering

Clustering algorithms aim to partition a dataset into groups or clusters, where data points within the same cluster are more similar to each other than to those in other clusters. Quantum clustering algorithms are a type of algorithm that leverage the principles of quantum mechanics to perform clustering tasks. The idea behind these algorithms is to exploit quantum phenomena, such as superposition and entanglement, to enhance the efficiency and effectiveness of clustering.

Quantum clustering algorithms are an active area of research within the field of quantum computing and have the potential to offer advantages over classical clustering algorithms in terms of computational efficiency and accuracy, particularly for large and complex datasets. Since Quantum hardware is currently still limited by processing power and requires improvement in error correction techniques, in this project, we will focus on quantum-inspired clustering algorithms that can run on the classical computers.

The most popular conventional clustering model is often referred to as the Parzen window estimator, where every data point is associated with a Gaussian kernel to approximate the probability density function. There is only one single parameter: the width (sigma) of the Gaussian function. In contrast, in quantum clustering, every data point is associated with a vector in the Hilbert space. By applying the Schrodinger equation, we can solve for the potential function. It has been shown that the quantum potential function can show the underlying structure of the data, where the minima indicate the centers of the clusters [2].

In the experiment, the data sets used are either text documents or oral conversation sentences. To extract the feature vector for each data point, the X^2 score is first calculated, then Principal Component Analysis (PCA) is used to reduce the dimension of the feature vector to only two. The F1 score is used to combine the precision and recall metrics. To estimate the parameter sigma, instead of using the popular statistical approach of the k-nearest neighbors (KNN) method, an easy method called Pattern Search is deployed. From the experiments, quantum clustering shows a higher F1 score than the traditional clustering method in identifying the topics of the different text data points. Additionally, the model can be applied to identify clusters of different writers of literature documents [3].

From the paper [3], we are confident that quantum clustering can be used to cluster movie subtitles into different types and actor source. After clustering, we can eliminate the clusters with fewer members, which are probably the “background subtitles,” and keep only a few large clusters. The remaining clusters can represent subtitles from the few different main characters. The labels from clustering can be carried forward to the next AI model that extracts actor’s voice from the background.

References

[1] Schulza-Forster et al, “JOINT PHONEME ALIGNMENT AND TEXT-INFORMED SPEECH SEPARATION ON HIGHLY CORRUPTED SPEECH“, Conference Proceeding at ICAASSP 2020.

[2] D. Horn, A. Gottlieb, “Algorithm for data clustering in pattern recognition
problems based on quantum mechanics”, Physical Review Letters, 2002.

[3] Ding Liu et al, “Analyzing documents with Quantum Clustering: A novel pattern

recognition algorithm based on quantum mechanics”, Pattern recognition Letter, 2016.