Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

M Kim, J Choi, D Kim, YM Ro - arxiv preprint arxiv:2308.01831, 2023 - arxiv.org
In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

M Kim, J Yeo, SJ Park, H Rha, YM Ro - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can
recognize different languages with a single trained model. As the massive multilingual …

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation

M Kim, J Choi, D Kim, YM Ro - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
This paper proposes a textless training method for many-to-many multilingual speech-to-
speech translation that can also benefit the transfer of pre-trained knowledge to text-based …

Visual speech recognition for low-resource languages with automatic labels from whisper model

JH Yeo, M Kim, S Watanabe, YM Ro - arxiv preprint arxiv:2309.08535, 2023 - arxiv.org
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

M Kim, JH Yeo, J Choi, SJ Park, YM Ro - arxiv preprint arxiv:2401.09802, 2024 - arxiv.org
This paper explores sentence-level Multilingual Visual Speech Recognition with a single
model for the first time. As the massive multilingual modeling of visual data requires huge …

Visual speech recognition using compact hypercomplex neural networks

II Panagos, G Sfikas, C Nikou - Pattern Recognition Letters, 2024 - Elsevier
Recent progress in visual speech recognition systems due to advances in deep learning
and large-scale public datasets has led to impressive performance compared to human …

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper

JH Yeo, M Kim, S Watanabe… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

JH Yeo, CW Kim, H Kim, H Rha, S Han… - arxiv preprint arxiv …, 2024 - arxiv.org
Lip reading aims to predict spoken language by analyzing lip movements. Despite
advancements in lip reading technologies, performance degrades when models are applied …