An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Mead: A large-scale audio-visual dataset for emotional talking-face generation

K Wang, Q Wu, L Song, Z Yang, W Wu, C Qian… - … on Computer Vision, 2020 - Springer
The synthesis of natural emotional reactions is an essential criterion in vivid talking-face
video generation. This criterion is nevertheless seldom taken into consideration in previous …

Beat: A large-scale semantic and emotional multi-modal dataset for conversational gestures synthesis

H Liu, Z Zhu, N Iwamoto, Y Peng, Z Li, Y Zhou… - European conference on …, 2022 - Springer
Achieving realistic, vivid, and human-like synthesized conversational gestures conditioned
on multi-modal data is still an unsolved problem due to the lack of available datasets …

EchoSpeech: Continuous Silent Speech Recognition on Minimally-obtrusive Eyewear Powered by Acoustic Sensing

R Zhang, K Li, Y Hao, Y Wang, Z Lai… - Proceedings of the …, 2023 - dl.acm.org
We present EchoSpeech, a minimally-obtrusive silent speech interface (SSI) powered by
low-power active acoustic sensing. EchoSpeech uses speakers and microphones mounted …

Can we read speech beyond the lips? rethinking roi selection for deep visual speech recognition

Y Zhang, S Yang, J **ao, S Shan… - 2020 15th IEEE …, 2020 - ieeexplore.ieee.org
Recent advances in deep learning have heightened interest among researchers in the field
of visual speech recognition (VSR). Currently, most existing methods equate VSR with …

An experimental analysis of deep learning architectures for supervised speech enhancement

SA Nossier, J Wall, M Moniri, C Glackin, N Cannings - Electronics, 2020 - mdpi.com
Recent speech enhancement research has shown that deep learning techniques are very
effective in removing background noise. Many deep neural networks are being proposed …