- Academic Search

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

J Hong, M Kim, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input
corruption situation where audio inputs and visual inputs are both corrupted, which is not …

Salva Cita Citato da 38 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Q Zhu, L Zhou, Z Zhang, S Liu, B Jiao… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Although speech is a simple and effective way for humans to communicate with the outside
world, a more realistic speech interaction contains multimodal information, eg, vision, text …

Salva Cita Citato da 36 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] thecvf.com

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Salva Cita Citato da 14 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Lip to speech synthesis with visual context attentional gan

M Kim, J Hong, YM Ro - Advances in Neural Information …, 2021 - proceedings.neurips.cc

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual
Context Attentional GAN (VCA-GAN), which can jointly model local and global lip …

Salva Cita Citato da 47 Articoli correlati Tutte e 9 le versioni Versione HTML

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

Salva Cita Citato da 8 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Speaker-adaptive lip reading with user-dependent padding

M Kim, H Kim, YM Ro - European Conference on Computer Vision, 2022 - Springer

Lip reading aims to predict speech based on lip movements alone. As it focuses on visual
information to model the speech, its performance is inherently sensitive to personal lip …

Salva Cita Citato da 24 Articoli correlati Tutte e 7 le versioni

[Free GPT-4]

[PDF] arxiv.org

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

M Kim, J Choi, D Kim, YM Ro - arxiv preprint arxiv:2308.01831, 2023 - arxiv.org

In this paper, we propose a method to learn unified representations of multilingual speech
and text with a single model, especially focusing on the purpose of speech synthesis. We …

Salva Cita Citato da 17 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition

M Kim, HI Kim, YM Ro - IEEE Transactions on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual Speech Recognition (VSR) aims to infer speech into text depending on lip
movements alone. As it focuses on visual information to model the speech, its performance …

Salva Cita Citato da 19 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Intelligible lip-to-speech synthesis with speech units

J Choi, M Kim, YM Ro - arxiv preprint arxiv:2305.19603, 2023 - arxiv.org

In this paper, we propose a novel Lip-to-Speech synthesis (L2S) framework, for synthesizing
intelligible speech from a silent lip movement video. Specifically, to complement the …

Salva Cita Citato da 19 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

JH Yeo, M Kim, J Choi, DH Kim… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip
movements. VSR is regarded as a challenging task because of the insufficient information …

Salva Cita Citato da 18 Articoli correlati Tutte e 3 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Cromm-vsr: Cross-modal memory augmented visual speech recognition

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

VatLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

Lip to speech synthesis with visual context attentional gan

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

Speaker-adaptive lip reading with user-dependent padding

Many-to-many spoken language translation via unified speech and text representation learning with unit-to-unit translation

Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition

Intelligible lip-to-speech synthesis with speech units

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model