Large-scale visual speech recognition
This work presents a scalable solution to open-vocabulary visual speech recognition. To
achieve this, we constructed the largest existing visual speech recognition dataset …
achieve this, we constructed the largest existing visual speech recognition dataset …
Extraction of visual features for lipreading
The multimodal nature of speech is often ignored in human-computer interaction, but lip
deformations and other body motion, such as those of the head, convey additional …
deformations and other body motion, such as those of the head, convey additional …
CUAVE: A new audio-visual database for multimodal human-computer interface research
EK Patterson, S Gurbuz, Z Tufekci… - 2002 IEEE International …, 2002 - ieeexplore.ieee.org
Multimodal signal processing has become an important topic of research for overcoming
certain problems of audio-only speech processing. Audio-visual speech recognition is one …
certain problems of audio-only speech processing. Audio-visual speech recognition is one …
Audio-visual integration in multimodal communication
T Chen, RR Rao - Proceedings of the IEEE, 1998 - ieeexplore.ieee.org
We review recent research that examines audio-visual integration in multimodal
communication. The topics include bimodality in human speech, human and automated lip …
communication. The topics include bimodality in human speech, human and automated lip …
[PDF][PDF] Audio visual speech recognition
We have made significant progress in automatic speech recognition ASR for well-defined
applications like dictation and medium vocabulary transaction processing tasks in relatively …
applications like dictation and medium vocabulary transaction processing tasks in relatively …
Audiovisual speech processing
T Chen - IEEE signal processing magazine, 2001 - ieeexplore.ieee.org
We have reported activities in audiovisual speech processing, with emphasis on lip reading
and lip synchronization. These research results have shown that, with lip reading, it is …
and lip synchronization. These research results have shown that, with lip reading, it is …
An image transform approach for HMM based automatic lipreading
This paper concentrates on the visual front end for hidden Markov model based automatic
lipreading. Two approaches for extracting features relevant to lipreading, given image …
lipreading. Two approaches for extracting features relevant to lipreading, given image …
Audiovisual information fusion in human–computer interfaces and intelligent environments: A survey
Microphones and cameras have been extensively used to observe and detect human
activity and to facilitate natural modes of interaction between humans and intelligent …
activity and to facilitate natural modes of interaction between humans and intelligent …
Moving-talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus
EK Patterson, S Gurbuz, Z Tufekci… - EURASIP Journal on …, 2002 - Springer
Strides in computer technology and the search for deeper, more powerful techniques in
signal processing have brought multimodal research to the forefront in recent years. Audio …
signal processing have brought multimodal research to the forefront in recent years. Audio …
Multiresolution and multimodal speech recognition with transformers
This paper presents an audio visual automatic speech recognition (AV-ASR) system using a
Transformer-based architecture. We particularly focus on the scene context provided by the …
Transformer-based architecture. We particularly focus on the scene context provided by the …