An overview of deep-learning-based audio-visual speech enhancement and separation
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …
extract either one or more target speech signals, respectively, from a mixture of sounds …
Deep audio-visual learning: A survey
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …
modalities, has drawn considerable attention since deep learning started to be used …
Mead: A large-scale audio-visual dataset for emotional talking-face generation
The synthesis of natural emotional reactions is an essential criterion in vivid talking-face
video generation. This criterion is nevertheless seldom taken into consideration in previous …
video generation. This criterion is nevertheless seldom taken into consideration in previous …
Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis
Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned
on multi-modal data is still an unsolved problem due to the lack of available datasets …
on multi-modal data is still an unsolved problem due to the lack of available datasets …
EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing
We present EchoSpeech, a minimally-obtrusive silent speech interface (SSI) powered by
low-power active acoustic sensing. EchoSpeech uses speakers and microphones mounted …
low-power active acoustic sensing. EchoSpeech uses speakers and microphones mounted …
Can we read speech beyond the lips? rethinking roi selection for deep visual speech recognition
Recent advances in deep learning have heightened interest among researchers in the field
of visual speech recognition (VSR). Currently, most existing methods equate VSR with …
of visual speech recognition (VSR). Currently, most existing methods equate VSR with …
An experimental analysis of deep learning architectures for supervised speech enhancement
Recent speech enhancement research has shown that deep learning techniques are very
effective in removing background noise. Many deep neural networks are being proposed …
effective in removing background noise. Many deep neural networks are being proposed …