An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Z Zhang, Y Xu, M Yu, SX Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Speech separation algorithms are often used to separate the target speech from other
interfering sources. However, purely neural network based speech separation systems often …

Audio-visual end-to-end multi-channel speech separation, dereverberation and recognition

G Li, J Deng, M Geng, Z **, T Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Accurate recognition of cocktail party speech containing overlap** speakers, noise and
reverberation remains a highly challenging task to date. Motivated by the invariance of …

Adverb: Visually guided audio dereverberation

S Chowdhury, S Ghosh, S Dasgupta… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues
in addition to the reverberant sound to estimate clean audio. Although audio-only …

Generalized spatio-temporal RNN beamformer for target speech separation

Y Xu, Z Zhang, M Yu, SX Zhang, D Yu - arxiv preprint arxiv:2101.01280, 2021 - arxiv.org
Although the conventional mask-based minimum variance distortionless response (MVDR)
could reduce the non-linear distortion, the residual noise level of the MVDR separated …

Seeing through the conversation: Audio-visual speech separation based on diffusion model

S Lee, C Jung, Y Jang, J Kim… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
The objective of this work is to extract the target speaker's voice from a mixture of voices
using visual cues. Existing works on audio-visual speech separation have demonstrated …

Learning audio-visual dereverberation

C Chen, W Sun, D Harwath… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Reverberation not only degrades the quality of speech for human perception, but also
severely impacts the accuracy of automatic speech recognition. Prior work attempts to …