Google Tudós

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Mentés Hivatkozás Idézetek száma: 14 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

X Chang, B Yan, K Choi, JW Jung, Y Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …

Mentés Hivatkozás Idézetek száma: 35 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] thecvf.com

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

J Choi, SJ Park, M Kim, YM Ro - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

This paper proposes a novel direct Audio-Visual Speech to Audio-Visual Speech
Translation (AV2AV) framework where the input and output of the system are multimodal (ie …

Mentés Hivatkozás Idézetek száma: 3 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] acm.org

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

M Kim, J Yeo, SJ Park, H Rha, YM Ro - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

This paper explores sentence-level multilingual Visual Speech Recognition (VSR) that can
recognize different languages with a single trained model. As the massive multilingual …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] aclanthology.org

Speech sense disambiguation: Tackling homophone ambiguity in end-to-end speech translation

T Yu, X Liu, L Ding, K Chen, D Tao… - Proceedings of the 62nd …, 2024 - aclanthology.org

End-to-end speech translation (ST) presents notable disambiguation challenges as it
necessitates simultaneous cross-modal and cross-lingual transformations. While word …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens

M Kim, J Choi, S Maiti, JH Yeo… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

In this paper, we propose methods to build a powerful and efficient Image-to-Speech
captioning (Im2Sp) model. To this end, we start with importing the rich knowledge related to …

Mentés Hivatkozás Idézetek száma: 8 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

X Chang, J Shi, J Tian, Y Wu, Y Tang, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Representing speech and audio signals in discrete units has become a compelling
alternative to traditional high-dimensional feature vectors. Numerous studies have …

Mentés Hivatkozás Idézetek száma: 13 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

M Kim, JH Yeo, J Choi, SJ Park, YM Ro - arxiv preprint arxiv:2401.09802, 2024 - arxiv.org

This paper explores sentence-level Multilingual Visual Speech Recognition with a single
model for the first time. As the massive multilingual modeling of visual data requires huge …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Translatotron 3: Speech to speech translation with monolingual data

E Nachmani, A Levkovitch, Y Ding… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-
speech translation from monolingual speech-text datasets by combining masked …

Mentés Hivatkozás Idézetek száma: 11 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

Tmt: Tri-modal translation between speech, image, and text by processing different modalities as different languages

M Kim, J Jung, H Rha, S Maiti, S Arora, X Chang… - arxiv preprint arxiv …, 2024 - arxiv.org

The capability to jointly process multi-modal information is becoming an essential task.
However, the limited number of paired multi-modal data and the large computational …

Mentés Hivatkozás Idézetek száma: 5 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation

Speech sense disambiguation: Tackling homophone ambiguity in end-to-end speech translation

Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

Translatotron 3: Speech to speech translation with monolingual data

Tmt: Tri-modal translation between speech, image, and text by processing different modalities as different languages