- Academic Search

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Save Cite Cited by 105 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] hal.science

Audiovisual speech source separation: An overview of key methodologies

B Rivet, W Wang, SM Naqvi… - IEEE Signal Processing …, 2014 - ieeexplore.ieee.org

The separation of speech signals measured at multiple microphones in noisy and
reverberant environments using only the audio modality has limitations because there is …

Save Cite Cited by 90 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] israelcohen.com

An end-to-end multimodal voice activity detection using wavenet encoder and residual networks

I Ariav, I Cohen - IEEE Journal of Selected Topics in Signal …, 2019 - ieeexplore.ieee.org

Recently, there has been growing use of deep neural networks in many modern speech-
based systems such as speaker recognition, speech enhancement, and emotion …

Save Cite Cited by 77 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

Detecting stairs and pedestrian crosswalks for the blind by RGBD camera

S Wang, Y Tian - 2012 IEEE International Conference on …, 2012 - ieeexplore.ieee.org

A computer vision-based wayfinding and navigation aid can improve the mobility of blind
and visually impaired people to travel independently. In this paper, we develop a new …

Save Cite Cited by 74 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] sciencedirect.com

End-to-end audiovisual speech activity detection with bimodal recurrent neural models

F Tao, C Busso - Speech Communication, 2019 - Elsevier

Speech activity detection (SAD) plays an important role in current speech processing
systems, including automatic speech recognition (ASR). SAD is particularly difficult in …

Save Cite Cited by 40 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] israelcohen.com

Audio-visual voice activity detection using diffusion maps

D Dov, R Talmon, I Cohen - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org

The performance of traditional voice activity detectors significantly deteriorates in the
presence of highly nonstationary noise and transient interferences. One solution is to …

Save Cite Cited by 58 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] israelcohen.com

A deep architecture for audio-visual voice activity detection in the presence of transients

I Ariav, D Dov, I Cohen - Signal Processing, 2018 - Elsevier

We address the problem of voice activity detection in difficult acoustic environments
including high levels of noise and transients, which are common in real life scenarios. We …

Save Cite Cited by 40 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Improved active speaker detection based on optical flow

C Huang, K Koishida - … of the IEEE/CVF Conference on …, 2020 - openaccess.thecvf.com

Active speaker detection refers to the task of inferring which (if any) of the visible people in a
video is/are speaking. Existing methods based on audiovisual fusion are often confused by …

Save Cite Cited by 24 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] bris.ac.uk

Visual voice activity detection in the wild

F Patrona, A Iosifidis, A Tefas… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org

The visual voice activity detection (V-VAD) problem in unconstrained environments is
investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and …

Save Cite Cited by 40 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Simultaneous-speaker voice activity detection and localization using mid-fusion of SVM and HMMs

VP Minotto, CR Jung, B Lee - IEEE Transactions on Multimedia, 2014 - ieeexplore.ieee.org

Humans can extract speech signals that they need to understand from a mixture of
background noise, interfering sound sources, and reverberation for effective communication …

Save Cite Cited by 45 Related articles All 8 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Visual voice activity detection with optical flow

Sub-word level lip reading with visual attention

Audiovisual speech source separation: An overview of key methodologies

An end-to-end multimodal voice activity detection using wavenet encoder and residual networks

Detecting stairs and pedestrian crosswalks for the blind by RGBD camera

End-to-end audiovisual speech activity detection with bimodal recurrent neural models

Audio-visual voice activity detection using diffusion maps

A deep architecture for audio-visual voice activity detection in the presence of transients

Improved active speaker detection based on optical flow

Visual voice activity detection in the wild

Simultaneous-speaker voice activity detection and localization using mid-fusion of SVM and HMMs