Lipnet: End-to-end sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arxiv preprint arxiv …, 2016 - arxiv.org
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arxiv preprint arxiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Comparing fusion models for DNN-based audiovisual continuous speech recognition

AH Abdelaziz - IEEE/ACM Transactions on Audio, Speech, and …, 2017 - ieeexplore.ieee.org
Audiovisual fusion is one of the most challenging tasks that continues to attract substantial
research interest in the field of audiovisual automatic speech recognition (AV-ASR). In the …

Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading

M Luo, S Yang, S Shan, X Chen - 2020 15th IEEE International …, 2020 - ieeexplore.ieee.org
Lip-reading aims to infer the speech content from the lip movement sequence and can be
seen as a typical sequence-to-sequence (seq2seq) problem which translates the input …

Gating neural network for large vocabulary audiovisual speech recognition

F Tao, C Busso - IEEE/ACM Transactions on Audio, Speech …, 2018 - ieeexplore.ieee.org
Audio-based automatic speech recognition (A-ASR) systems are affected by noisy
conditions in real-world applications. Adding visual cues to the ASR system is an appealing …

Improving speaker-independent lipreading with domain-adversarial training

M Wand, J Schmidhuber - arxiv preprint arxiv:1708.01565, 2017 - arxiv.org
We present a Lipreading system, ie a speech recognition system using only visual features,
which uses domain-adversarial training for speaker independence. Domain-adversarial …

Investigations on end-to-end audiovisual fusion

M Wand, J Schmidhuber, NT Vu - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise
in the acoustic signal. Leveraging recent developments in deep neural network-based …

Aligning audiovisual features for audiovisual speech recognition

F Tao, C Busso - … Conference on Multimedia and Expo (ICME), 2018 - ieeexplore.ieee.org
Visual information can improve the performance of automatic speech recognition (ASR),
especially in the presence of background noise or different speech modes. A key problem is …

RETRACTED ARTICLE: Application of deep learning in Mandarin Chinese lip-reading recognition

G **ng, L Han, Y Zheng, M Zhao - EURASIP Journal on Wireless …, 2023 - Springer
Lip-reading is an emerging technology in recent years, and it can be applied to the field of
language recovery, criminal investigation, identity authentication, etc. We aim to recognize …

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

DK Margam, R Aralikatti, T Sharma, A Thanda… - arxiv preprint arxiv …, 2019 - arxiv.org
In recent years, deep learning based machine lipreading has gained prominence. To this
end, several architectures such as LipNet, LCANet and others have been proposed which …