- Academic Search

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier

In the last few years, there has been an increasing interest in develo** systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

Save Cite Cited by 161 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

[PDF][PDF] Multimodal deep learning.

J Ngiam, A Khosla, M Kim, J Nam, H Lee, AY Ng - ICML, 2011 - academia.edu

Deep networks have been successfully applied to unsupervised feature learning for single
modalities (eg, text, images or audio). In this work, we propose a novel application of deep …

Save Cite Cited by 4300 Related articles All 29 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

A review of recent advances in visual speech decoding

Z Zhou, G Zhao, X Hong, M Pietikäinen - Image and vision computing, 2014 - Elsevier

Visual speech information plays an important role in automatic speech recognition (ASR)
especially when audio is corrupted or even inaccessible. Despite the success of audio …

Save Cite Cited by 239 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Multimodal learning with deep boltzmann machines

N Srivastava, RR Salakhutdinov - Advances in neural …, 2012 - proceedings.neurips.cc

Abstract We propose a Deep Boltzmann Machine for learning a generative model of
multimodal data. We show how to use the model to extract a meaningful representation of …

Save Cite Cited by 2217 Related articles All 33 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep multimodal learning for audio-visual speech recognition

Y Mroueh, E Marcheret, V Goel - 2015 IEEE International …, 2015 - ieeexplore.ieee.org

In this paper, we present methods in deep multimodal learning for fusing speech and visual
modalities for Audio-Visual Automatic Speech Recognition (AV-ASR). First, we study an …

Save Cite Cited by 308 Related articles All 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Ouluvs2: A multi-view audiovisual database for non-rigid mouth motion analysis

I Anina, Z Zhou, G Zhao… - 2015 11th IEEE …, 2015 - ieeexplore.ieee.org

Visual speech constitutes a large part of our nonrigid facial motion and contains important
information that allows machines to interact with human users, for instance, through …

Save Cite Cited by 187 Related articles All 5 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-grained spatio-temporal features perceived network for event-based lip-reading

G Tan, Y Wang, H Han, Y Cao… - Proceedings of the …, 2022 - openaccess.thecvf.com

Automatic lip-reading (ALR) aims to recognize words using visual information from the
speaker's lip movements. In this work, we introduce a novel type of sensing device, event …

Save Cite Cited by 32 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deep learning-based automated lip-reading: A survey

S Fenghour, D Chen, K Guo, B Li, P **ao - IEEE Access, 2021 - ieeexplore.ieee.org

A survey on automated lip-reading approaches is presented in this paper with the main
focus being on deep learning related methodologies which have proven to be more fruitful …

Save Cite Cited by 58 Related articles All 4 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Lip reading sentences using deep learning with only visual cues

S Fenghour, D Chen, K Guo, P **ao - IEEE Access, 2020 - ieeexplore.ieee.org

In this paper, a neural network-based lip reading system is proposed. The system is lexicon-
free and uses purely visual cues. With only a limited number of visemes as classes to …

Save Cite Cited by 64 Related articles All 3 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

End-to-end neuromorphic lip-reading

H Bulzomi, M Schweiker, A Gruel… - Proceedings of the …, 2023 - openaccess.thecvf.com

Human speech perception is intrinsically a multi-modal task since speech production
requires the speaker to move the lips, producing visual cues in addition to auditory …

Save Cite Cited by 17 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

The challenge of multispeaker lip-reading.

Survey on automatic lip-reading in the era of deep learning

[PDF][PDF] Multimodal deep learning.

A review of recent advances in visual speech decoding

Multimodal learning with deep boltzmann machines

Deep multimodal learning for audio-visual speech recognition

Ouluvs2: A multi-view audiovisual database for non-rigid mouth motion analysis

Multi-grained spatio-temporal features perceived network for event-based lip-reading

Deep learning-based automated lip-reading: A survey

Lip reading sentences using deep learning with only visual cues

End-to-end neuromorphic lip-reading