- Academic Search

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Speichern Zitieren Zitiert von: 14 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

M Kim, J Choi, D Kim, YM Ro - arxiv preprint arxiv:2308.01831, 2023 - arxiv.org

In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …

Speichern Zitieren Zitiert von: 17 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Speichern Zitieren Zitiert von: 18 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] acm.org

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

M Kim, J Yeo, SJ Park, H Rha, YM Ro - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can
recognize different languages with a single trained model. As the massive multilingual …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation

M Kim, J Choi, D Kim, YM Ro - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

This paper proposes a textless training method for many-to-many multilingual speech-to-
speech translation that can also benefit the transfer of pre-trained knowledge to text-based …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] arxiv.org

Visual speech recognition for low-resource languages with automatic labels from whisper model

JH Yeo, M Kim, S Watanabe, YM Ro - arxiv preprint arxiv:2309.08535, 2023 - arxiv.org

This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …

Speichern Zitieren Zitiert von: 5 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

M Kim, JH Yeo, J Choi, SJ Park, YM Ro - arxiv preprint arxiv:2401.09802, 2024 - arxiv.org

This paper explores sentence-level Multilingual Visual Speech Recognition with a single
model for the first time. As the massive multilingual modeling of visual data requires huge …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] ssrn.com

Visual speech recognition using compact hypercomplex neural networks

II Panagos, G Sfikas, C Nikou - Pattern Recognition Letters, 2024 - Elsevier

Recent progress in visual speech recognition systems due to advances in deep learning
and large-scale public datasets has led to impressive performance compared to human …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper

JH Yeo, M Kim, S Watanabe… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple
languages, especially for low-resource languages that have a limited number of labeled …

Speichern Zitieren Zitiert von: 10 Ähnliche Artikel

[Free GPT-4]

[PDF] arxiv.org

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language

JH Yeo, CW Kim, H Kim, H Rha, S Han… - arxiv preprint arxiv …, 2024 - arxiv.org

Lip reading aims to predict spoken language by analyzing lip movements. Despite
advancements in lip reading technologies, performance degrades when models are applied …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Multi-temporal lip-audio memory for visual speech recognition

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Textless Unit-to-Unit Training for Many-to-Many Multilingual Speech-to-Speech Translation

Visual speech recognition for low-resource languages with automatic labels from whisper model

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

Visual speech recognition using compact hypercomplex neural networks

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper

Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language