Multimodal audiovisual speech recognition architecture using a three‐feature multi‐fusion method for noise‐robust systems

S Jeon, J Lee, D Yeo, YJ Lee, SJ Kim - ETRI Journal, 2024 - Wiley Online Library
Exposure to varied noisy environments impairs the recognition performance of artificial
intelligence‐based speech recognition technologies. Degraded‐performance services can …

Event-Triggered Fixed-Time Sliding Mode Control for Lip-Reading-Driven UAV: Disturbance Rejection Using Wind Field Optimization

T Lan, J Song, Z Hou, K Chen, S He… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
This paper investigates the fixed-time sliding mode control (FTSMC) problem for a
quadcopter unmanned aerial vehicle (QUAV), which is driven by a lip-reading recognition …

Tailored Design of Audio-Visual Speech Recognition Models using Branchformers

D Gimeno-Gómez, CD Martínez-Hinarejos - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in Audio-Visual Speech Recognition (AVSR) have led to unprecedented
achievements in the field, improving the robustness of this type of system in adverse, noisy …

Continuous lipreading based on acoustic temporal alignments

D Gimeno-Gómez, CD Martínez-Hinarejos - EURASIP Journal on Audio …, 2024 - Springer
Visual speech recognition (VSR) is a challenging task that has received increasing interest
during the last few decades. Current state of the art employs powerful end-to-end …

Comparing speaker adaptation methods for visual speech recognition for continuous spanish

D Gimeno-Gómez, CD Martínez-Hinarejos - Applied Sciences, 2023 - mdpi.com
Visual speech recognition (VSR) is a challenging task that aims to interpret speech based
solely on lip movements. However, although remarkable results have recently been reached …

Evaluation of end-to-end continuous spanish lipreading in different data conditions

D Gimeno-Gómez, CD Martínez-Hinarejos - Language Resources and …, 2025 - Springer
Visual speech recognition remains an open research problem where different challenges
must be considered by dispensing with the auditory sense, such as visual ambiguities, the …

IR-UWB radar-based contactless silent speech recognition of vowels, consonants, words, and phrases

S Lee, Y Shin, M Kim, J Seo - IEEE Access, 2023 - ieeexplore.ieee.org
Several sensing techniques have been proposed for silent speech recognition (SSR);
however, many of these methods require invasive processes or sensor attachment to the …

Arabic Lip Reading with Limited Data Using Deep Learning

Z Jabr, S Etemadi, N Mozayani - IEEE Access, 2024 - ieeexplore.ieee.org
Two main challenges faced by deep learning systems are related to the amount of data and
the complexity of the model concerning the number and type of layers and the number of …

Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish

D Gimeno-Gómez, CD Martínez-Hinarejos - arxiv preprint arxiv …, 2023 - arxiv.org
Different studies have shown the importance of visual cues throughout the speech
perception process. In fact, the development of audiovisual approaches has led to advances …

[PDF][PDF] Extending LIP-RTVE: Towards A Large-Scale Audio-Visual Dataset for Continuous Spanish in the Wild

M Zaragozá-Portolés, D Gimeno-Gómez… - Proc. IberSPEECH …, 2024 - isca-archive.org
This article presents the extension of the LIP-RTVE dataset, a dataset dedicated to the
Spanish language for advancing audiovisual speech technologies. The annotated corpus …