Google Academic

D Ivanko, D Ryumin, A Karpov - Mathematics, 2023 - mdpi.com

This article provides a detailed review of recent advances in audio-visual speech
recognition (AVSR) methods that have been developed over the last decade (2013–2023) …

Salvați Citați Citat de 20 ori Articole cu conținut similar Toate cele 5 versiuni În cache

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces

Z Peng, Y Luo, Y Shi, H Xu, X Zhu, H Liu, J He… - Proceedings of the 31st …, 2023 - dl.acm.org

Speech-driven 3D face animation technique, extending its applications to various
multimedia fields. Previous research has generated promising realistic lip movements and …

Salvați Citați Citat de 39 ori Articole cu conținut similar Toate cele 3 versiuni

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

D Ryumin, A Axyonov, E Ryumina, D Ivanko… - Expert Systems with …, 2024 - Elsevier

This article presents a research methodology for audio–visual speech recognition (AVSR) in
driver assistive systems. These systems necessitate ongoing interaction with drivers while …

Salvați Citați Citat de 14 ori Articole cu conținut similar Toate cele 2 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Synthvsr: Scaling up visual speech recognition with synthetic supervision

X Liu, E Lakomkin, K Vougioukas… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recently reported state-of-the-art results in visual speech recognition (VSR) often rely on
increasingly large amounts of video data, while the publicly available transcribed video …

Salvați Citați Citat de 23 ori Articole cu conținut similar Toate cele 9 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Multilingual audio-visual speech recognition with hybrid CTC/RNN-T fast conformer

M Burchi, KC Puvvada, J Balam… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Humans are adept at leveraging visual cues from lip movements for recognizing speech in
adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow …

Salvați Citați Citat de 13 ori Articole cu conținut similar Toate cele 5 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

Salvați Citați Citat de 13 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Y Dai, H Chen, J Du, R Wang, S Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Advanced Audio-Visual Speech Recognition (AVSR) systems have been observed
to be sensitive to missing video frames performing even worse than single-modality models …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] thecvf.com

Lost in Translation: Lip-Sync Deepfake Detection from Audio-Video Mismatch

M Bohacek, H Farid - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Highly realistic voice cloning combined with AI-powered video manipulation allows for the
creation of compelling lip-sync deepfakes where anyone can be made to say things they …

Salvați Citați Citat de 4 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

J Hwang, M Hira, C Chen, X Zhang, Z Ni… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims
to accelerate the research and development of audio and speech technologies by providing …

Salvați Citați Citat de 17 ori Articole cu conținut similar Toate cele 6 versiuni

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Mlca-avsr: Multi-layer cross attention fusion based audio-visual speech recognition

H Wang, P Guo, P Zhou, L **e - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

While automatic speech recognition (ASR) systems degrade significantly in noisy
environments, audio-visual speech recognition (AVSR) systems aim to complement the …

Salvați Citați Citat de 17 ori Articole cu conținut similar Toate cele 3 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Auto-avsr: Audio-visual speech recognition with automatic labels

A review of recent advances on deep learning methods for audio-visual speech recognition

Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces

Audio–visual speech recognition based on regulated transformer and spatio–temporal fusion strategy for driver assistive systems

Synthvsr: Scaling up visual speech recognition with synthetic supervision

Multilingual audio-visual speech recognition with hybrid CTC/RNN-T fast conformer

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Lost in Translation: Lip-Sync Deepfake Detection from Audio-Video Mismatch

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Mlca-avsr: Multi-layer cross attention fusion based audio-visual speech recognition