Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge
This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …
which has not been well addressed in the previous literature. Since low-resource languages …
Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation
In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …
and text with a single model, especially focusing on the purpose of speech synthesis. We …
Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …
movements. VSR is regarded as a challenging task because of the insufficient information …
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can
recognize different languages with a single trained model. As the massive multilingual …
recognize different languages with a single trained model. As the massive multilingual …
Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation
This paper proposes a textless training method for many-to-many multilingual speech-to-
speech translation that can also benefit the transfer of pre-trained knowledge to text-based …
speech translation that can also benefit the transfer of pre-trained knowledge to text-based …
Visual speech recognition for low-resource languages with automatic labels from whisper model
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …
languages, especially for low-resource languages that have a limited number of labeled …
Multilingual visual speech recognition with a single model by learning with discrete visual speech units
This paper explores sentence-level Multilingual Visual Speech Recognition with a single
model for the first time. As the massive multilingual modeling of visual data requires huge …
model for the first time. As the massive multilingual modeling of visual data requires huge …
Visual speech recognition using compact hypercomplex neural networks
Recent progress in visual speech recognition systems due to advances in deep learning
and large-scale public datasets has led to impressive performance compared to human …
and large-scale public datasets has led to impressive performance compared to human …
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …
languages, especially for low-resource languages that have a limited number of labeled …
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Lip reading aims to predict spoken language by analyzing lip movements. Despite
advancements in lip reading technologies, performance degrades when models are applied …
advancements in lip reading technologies, performance degrades when models are applied …