Survey on automatic lip-reading in the era of deep learning
In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …
A review of recent advances in visual speech decoding
Visual speech information plays an important role in automatic speech recognition (ASR)
especially when audio is corrupted or even inaccessible. Despite the success of audio …
especially when audio is corrupted or even inaccessible. Despite the success of audio …
Lip reading sentences in the wild
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …
with or without the audio. Unlike previous works that have focussed on recognising a limited …
Lipreading with long short-term memory
Lipreading, ie speech recognition from visual-only recordings of a speaker's face, can be
achieved with a processing pipeline based solely on neural networks, yielding significantly …
achieved with a processing pipeline based solely on neural networks, yielding significantly …
Sign language recognition
This chapter covers the key aspects of sign-language recognition (SLR), starting with a brief
introduction to the motivations and requirements, followed by a précis of sign linguistics and …
introduction to the motivations and requirements, followed by a précis of sign linguistics and …
Pseudo-convolutional policy gradient for sequence-to-sequence lip-reading
Lip-reading aims to infer the speech content from the lip movement sequence and can be
seen as a typical sequence-to-sequence (seq2seq) problem which translates the input …
seen as a typical sequence-to-sequence (seq2seq) problem which translates the input …
Phoneme-to-viseme map**s: the good, the bad, and the ugly
Visemes are the visual equivalent of phonemes. Although not precisely defined, a common
working definition of a viseme is “a set of phonemes which have identical appearance on the …
working definition of a viseme is “a set of phonemes which have identical appearance on the …
Improving speaker-independent lipreading with domain-adversarial training
We present a Lipreading system, ie a speech recognition system using only visual features,
which uses domain-adversarial training for speaker independence. Domain-adversarial …
which uses domain-adversarial training for speaker independence. Domain-adversarial …
Audio-visual speech recognition using deep bottleneck features and high-performance lipreading
S Tamura, H Ninomiya, N Kitaoka… - 2015 Asia-Pacific …, 2015 - ieeexplore.ieee.org
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring
high-performance visual features,(2) applying audio and visual deep bottleneck features to …
high-performance visual features,(2) applying audio and visual deep bottleneck features to …
[PDF][PDF] Improving visual features for lip-reading
Automatic speech recognition systems that utilise the visual modality of speech often are
investigated within a speakerdependent or a multi-speaker paradigm. That is, during training …
investigated within a speakerdependent or a multi-speaker paradigm. That is, during training …