Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier
In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

A survey of research on lipreading technology

M Hao, M Mamut, N Yadikar, A Aysa, K Ubul - IEEE Access, 2020 - ieeexplore.ieee.org
Although automatic speech recognition (ASR) technology is mature, there are still some
unsolved problems, such as how to accurately identify what the speaker is saying in a noisy …

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arxiv preprint arxiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

USEV: Universal speaker extraction with visual cue

Z Pan, M Ge, H Li - IEEE/ACM Transactions on Audio, Speech …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the target speaker's speech from a multi-
talker speech mixture. The prior studies focus mostly on speaker extraction from a highly …

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

SAFARI: Speech-Associated Facial Authentication for AR/VR Settings via Robust VIbration Signatures

T Zhang, Q Ji, Z Ye, MMRR Akanda… - Proceedings of the …, 2024 - dl.acm.org
In AR/VR devices, the voice interface, serving as one of the primary AR/VR control
mechanisms, enables users to interact naturally using speeches (voice commands) for …

Audiovisual speech perception in noise in younger and older bilinguals.

A Chauvin, S Pellerin, AF Boatswain-Jacques… - Psychology and …, 2024 - psycnet.apa.org
Speech perception in noise becomes increasingly difficult with age. Similarly, bilinguals
often have difficulty with speech perception in noise in their second language (L2) due to …

SpotFast networks with memory augmented lateral transformers for lipreading

P Wiriyathammabhum - International Conference on Neural Information …, 2020 - Springer
This paper presents a novel deep learning architecture for word-level lipreading. Previous
works suggest a potential for incorporating a pretrained deep 3D Convolutional Neural …

CALLip: Lipreading using contrastive and attribute learning

Y Huang, X Liang, C Fang - Proceedings of the 29th ACM International …, 2021 - dl.acm.org
Lipreading, aiming at interpreting speech by watching the lip movements of the speaker, has
great significance in human communication and speech understanding. Despite having …