Sub-word level lip reading with visual attention
The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …
silent videos. Most prior works deal with the open-set visual speech recognition problem by …
Audiovisual speech source separation: An overview of key methodologies
The separation of speech signals measured at multiple microphones in noisy and
reverberant environments using only the audio modality has limitations because there is …
reverberant environments using only the audio modality has limitations because there is …
An end-to-end multimodal voice activity detection using wavenet encoder and residual networks
I Ariav, I Cohen - IEEE Journal of Selected Topics in Signal …, 2019 - ieeexplore.ieee.org
Recently, there has been growing use of deep neural networks in many modern speech-
based systems such as speaker recognition, speech enhancement, and emotion …
based systems such as speaker recognition, speech enhancement, and emotion …
Detecting stairs and pedestrian crosswalks for the blind by RGBD camera
A computer vision-based wayfinding and navigation aid can improve the mobility of blind
and visually impaired people to travel independently. In this paper, we develop a new …
and visually impaired people to travel independently. In this paper, we develop a new …
End-to-end audiovisual speech activity detection with bimodal recurrent neural models
Speech activity detection (SAD) plays an important role in current speech processing
systems, including automatic speech recognition (ASR). SAD is particularly difficult in …
systems, including automatic speech recognition (ASR). SAD is particularly difficult in …
Audio-visual voice activity detection using diffusion maps
The performance of traditional voice activity detectors significantly deteriorates in the
presence of highly nonstationary noise and transient interferences. One solution is to …
presence of highly nonstationary noise and transient interferences. One solution is to …
A deep architecture for audio-visual voice activity detection in the presence of transients
We address the problem of voice activity detection in difficult acoustic environments
including high levels of noise and transients, which are common in real life scenarios. We …
including high levels of noise and transients, which are common in real life scenarios. We …
Improved active speaker detection based on optical flow
Active speaker detection refers to the task of inferring which (if any) of the visible people in a
video is/are speaking. Existing methods based on audiovisual fusion are often confused by …
video is/are speaking. Existing methods based on audiovisual fusion are often confused by …
Visual voice activity detection in the wild
The visual voice activity detection (V-VAD) problem in unconstrained environments is
investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and …
investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and …
Simultaneous-speaker voice activity detection and localization using mid-fusion of SVM and HMMs
Humans can extract speech signals that they need to understand from a mixture of
background noise, interfering sound sources, and reverberation for effective communication …
background noise, interfering sound sources, and reverberation for effective communication …