Sub-word level lip reading with visual attention

KR Prajwal, T Afouras… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The goal of this paper is to learn strong lip reading models that can recognise speech in
silent videos. Most prior works deal with the open-set visual speech recognition problem by …

Audiovisual speech source separation: An overview of key methodologies

B Rivet, W Wang, SM Naqvi… - IEEE Signal Processing …, 2014 - ieeexplore.ieee.org
The separation of speech signals measured at multiple microphones in noisy and
reverberant environments using only the audio modality has limitations because there is …

An end-to-end multimodal voice activity detection using wavenet encoder and residual networks

I Ariav, I Cohen - IEEE Journal of Selected Topics in Signal …, 2019 - ieeexplore.ieee.org
Recently, there has been growing use of deep neural networks in many modern speech-
based systems such as speaker recognition, speech enhancement, and emotion …

Detecting stairs and pedestrian crosswalks for the blind by RGBD camera

S Wang, Y Tian - 2012 IEEE International Conference on …, 2012 - ieeexplore.ieee.org
A computer vision-based wayfinding and navigation aid can improve the mobility of blind
and visually impaired people to travel independently. In this paper, we develop a new …

End-to-end audiovisual speech activity detection with bimodal recurrent neural models

F Tao, C Busso - Speech Communication, 2019 - Elsevier
Speech activity detection (SAD) plays an important role in current speech processing
systems, including automatic speech recognition (ASR). SAD is particularly difficult in …

Audio-visual voice activity detection using diffusion maps

D Dov, R Talmon, I Cohen - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
The performance of traditional voice activity detectors significantly deteriorates in the
presence of highly nonstationary noise and transient interferences. One solution is to …

A deep architecture for audio-visual voice activity detection in the presence of transients

I Ariav, D Dov, I Cohen - Signal Processing, 2018 - Elsevier
We address the problem of voice activity detection in difficult acoustic environments
including high levels of noise and transients, which are common in real life scenarios. We …

Improved active speaker detection based on optical flow

C Huang, K Koishida - … of the IEEE/CVF Conference on …, 2020 - openaccess.thecvf.com
Active speaker detection refers to the task of inferring which (if any) of the visible people in a
video is/are speaking. Existing methods based on audiovisual fusion are often confused by …

Visual voice activity detection in the wild

F Patrona, A Iosifidis, A Tefas… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
The visual voice activity detection (V-VAD) problem in unconstrained environments is
investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and …

Simultaneous-speaker voice activity detection and localization using mid-fusion of SVM and HMMs

VP Minotto, CR Jung, B Lee - IEEE Transactions on Multimedia, 2014 - ieeexplore.ieee.org
Humans can extract speech signals that they need to understand from a mixture of
background noise, interfering sound sources, and reverberation for effective communication …