An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier
Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration

WN Hsu, T Remez, B Shi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …

Interspeech 2022 audio deep packet loss concealment challenge

L Diener, S Sootla, S Branets, A Saabas… - arxiv preprint arxiv …, 2022 - arxiv.org
Audio Packet Loss Concealment (PLC) is the hiding of gaps in audio streams caused by
data transmission failures in packet switched networks. This is a common problem, and of …

Can audio-visual integration strengthen robustness under multimodal attacks?

Y Tian, C Xu - Proceedings of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
In this paper, we propose to make a systematic study on machines' multisensory perception
under attacks. We use the audio-visual event recognition task against multimodal …

Speechpainter: Text-conditioned speech inpainting

Z Borsos, M Sharifi, M Tagliasacchi - arxiv preprint arxiv:2202.07273, 2022 - arxiv.org
We propose SpeechPainter, a model for filling in gaps of up to one second in speech
samples by leveraging an auxiliary textual input. We demonstrate that the model performs …

Deep prior-based audio inpainting using multi-resolution harmonic convolutional neural networks

F Miotello, M Pezzoli, L Comanducci… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this manuscript, we propose a novel method to perform audio inpainting, ie, the
restoration of audio signals presenting multiple missing parts. Audio inpainting can be …

Diffusion-based audio inpainting

E Moliner, V Välimäki - arxiv preprint arxiv:2305.15266, 2023 - arxiv.org
Audio inpainting aims to reconstruct missing segments in corrupted recordings. Most of
existing methods produce plausible reconstructions when the gap lengths are short, but …

Audio-visual speech synthesis using vision transformer–enhanced autoencoders with ensemble of loss functions

S Ghosh, S Sarkar, S Ghosh, F Zalkow, ND Jana - Applied Intelligence, 2024 - Springer
Audio-visual speech synthesis (AVSS) has garnered attention in recent years for its utility in
the realm of audio-visual learning. AVSS transforms one speaker's speech into another's …

Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech enhancement

WN Hsu, T Remez, B Shi, J Donley, Y Adi - arxiv preprint arxiv …, 2022 - arxiv.org
Prior works on improving speech quality with visual input typically study each type of
auditory distortion separately (eg, separation, inpainting, video-to-speech) and present …