Dynamical variational autoencoders: A comprehensive review

L Girin, S Leglaive, X Bie, J Diard, T Hueber… - arxiv preprint arxiv …, 2020 - arxiv.org
Variational autoencoders (VAEs) are powerful deep generative models widely used to
represent high-dimensional complex data through a low-dimensional latent space learned …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation

JM Lemercier, J Richter, S Welker… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Diffusion models have shown a great ability at bridging the performance gap between
predictive and generative approaches for speech enhancement. We have shown that they …

Speech enhancement with score-based generative models in the complex STFT domain

S Welker, J Richter, T Gerkmann - arxiv preprint arxiv:2203.17004, 2022 - arxiv.org
Score-based generative models (SGMs) have recently shown impressive results for difficult
generative tasks such as the unconditional and conditional generation of natural images …

SELM: Speech enhancement using discrete tokens and language models

Z Wang, X Zhu, Z Zhang, YJ Lv, N Jiang… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Language models (LMs) have recently shown superior performances in various speech
generation tasks, demonstrating their powerful ability for semantic context modeling. Given …

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Cold diffusion for speech enhancement

H Yen, FG Germain, G Wichern… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Diffusion models have recently shown promising results for difficult enhancement tasks such
as the conditional and unconditional restoration of natural images and audio signals. In this …

MetricGAN-U: Unsupervised speech enhancement/dereverberation based only on noisy/reverberated speech

SW Fu, C Yu, KH Hung, M Ravanelli… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Most of the deep learning-based speech enhancement models are learned in a supervised
manner, which implies that pairs of noisy and clean speech are required during training …

The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement

D de Oliveira, S Welker, J Richter… - arxiv preprint arxiv …, 2024 - arxiv.org
To obtain improved speech enhancement models, researchers often focus on increasing
performance according to specific instrumental metrics. However, when the same metric is …