Survey on automatic lip-reading in the era of deep learning

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier
In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

A review of recent advances in visual speech decoding

Z Zhou, G Zhao, X Hong, M Pietikäinen - Image and vision computing, 2014 - Elsevier
Visual speech information plays an important role in automatic speech recognition (ASR)
especially when audio is corrupted or even inaccessible. Despite the success of audio …

Lip reading sentences in the wild

J Son Chung, A Senior, O Vinyals… - Proceedings of the …, 2017 - openaccess.thecvf.com
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

Lipreading with long short-term memory

M Wand, J Koutník… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Lipreading, ie speech recognition from visual-only recordings of a speaker's face, can be
achieved with a processing pipeline based solely on neural networks, yielding significantly …

Sign language recognition

H Cooper, B Holt, R Bowden - Visual Analysis of Humans: Looking at …, 2011 - Springer
This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief
introduction to the motivations and requirements, followed by a précis of sign linguistics and …

Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading

M Luo, S Yang, S Shan, X Chen - 2020 15th IEEE International …, 2020 - ieeexplore.ieee.org
Lip-reading aims to infer the speech content from the lip movement sequence and can be
seen as a typical sequence-to-sequence (seq2seq) problem which translates the input …

Phoneme-to-viseme map**s: the good, the bad, and the ugly

HL Bear, R Harvey - Speech Communication, 2017 - Elsevier
Visemes are the visual equivalent of phonemes. Although not precisely defined, a common
working definition of a viseme is “a set of phonemes which have identical appearance on the …

Improving speaker-independent lipreading with domain-adversarial training

M Wand, J Schmidhuber - arxiv preprint arxiv:1708.01565, 2017 - arxiv.org
We present a Lipreading system, ie a speech recognition system using only visual features,
which uses domain-adversarial training for speaker independence. Domain-adversarial …

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

S Tamura, H Ninomiya, N Kitaoka… - 2015 Asia-Pacific …, 2015 - ieeexplore.ieee.org
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring
high-performance visual features,(2) applying audio and visual deep bottleneck features to …

[PDF][PDF] Improving visual features for lip-reading

Y Lan, BJ Theobald, R Harvey, EJ Ong… - Auditory-visual speech …, 2010 - isca-archive.org
Automatic speech recognition systems that utilise the visual modality of speech often are
investigated within a speakerdependent or a multi-speaker paradigm. That is, during training …