Large-scale visual speech recognition

B Shillingford, Y Assael, MW Hoffman, T Paine… - arxiv preprint arxiv …, 2018 - arxiv.org
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …

Extraction of visual features for lipreading

I Matthews, TF Cootes, JA Bangham… - … on Pattern Analysis …, 2002 - ieeexplore.ieee.org
The multimodal nature of speech is often ignored in human-computer interaction, but lip
deformations and other body motion, such as those of the head, convey additional …

CUAVE: A new audio-visual database for multimodal human-computer interface research

EK Patterson, S Gurbuz, Z Tufekci… - 2002 IEEE International …, 2002 - ieeexplore.ieee.org
Multimodal signal processing has become an important topic of research for overcoming
certain problems of audio-only speech processing. Audio-visual speech recognition is one …

Audio-visual integration in multimodal communication

T Chen, RR Rao - Proceedings of the IEEE, 1998 - ieeexplore.ieee.org
We review recent research that examines audio-visual integration in multimodal
communication. The topics include bimodality in human speech, human and automated lip …

[PDF][PDF] Audio visual speech recognition

C Neti, G Potamianos, J Luettin, I Matthews, H Glotin… - 2000 - infoscience.epfl.ch
We have made significant progress in automatic speech recognition ASR for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …

Audiovisual speech processing

T Chen - IEEE signal processing magazine, 2001 - ieeexplore.ieee.org
We have reported activities in audiovisual speech processing, with emphasis on lip reading
and lip synchronization. These research results have shown that, with lip reading, it is …

An image transform approach for HMM based automatic lipreading

G Potamianos, HP Graf… - … Conference on Image …, 1998 - ieeexplore.ieee.org
This paper concentrates on the visual front end for hidden Markov model based automatic
lipreading. Two approaches for extracting features relevant to lipreading, given image …

Audiovisual information fusion in human–computer interfaces and intelligent environments: A survey

ST Shivappa, MM Trivedi, BD Rao - Proceedings of the IEEE, 2010 - ieeexplore.ieee.org
Microphones and cameras have been extensively used to observe and detect human
activity and to facilitate natural modes of interaction between humans and intelligent …

Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus

EK Patterson, S Gurbuz, Z Tufekci… - EURASIP Journal on …, 2002 - Springer
Strides in computer technology and the search for deeper, more powerful techniques in
signal processing have brought multimodal research to the forefront in recent years. Audio …

Multiresolution and multimodal speech recognition with transformers

G Paraskevopoulos, S Parthasarathy, A Khare… - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a
Transformer-based architecture. We particularly focus on the scene context provided by the …