Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

An overview of image caption generation methods

H Wang, Y Zhang, X Yu - Computational intelligence and …, 2020 - Wiley Online Library
In recent years, with the rapid development of artificial intelligence, image caption has
gradually attracted the attention of many researchers in the field of artificial intelligence and …

Speech enhancement and dereverberation with diffusion-based generative models

J Richter, S Welker, JM Lemercier… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In this work, we build upon our previous publication and use diffusion-based generative
models for speech enhancement. We present a detailed overview of the diffusion process …

Conditional diffusion probabilistic model for speech enhancement

YJ Lu, ZQ Wang, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement is a critical component of many user-oriented audio applications, yet
current systems still suffer from distorted and unnatural outputs. While generative models …

Metricgan+: An improved version of metricgan for speech enhancement

SW Fu, C Yu, TA Hsieh, P Plantinga… - arxiv preprint arxiv …, 2021 - arxiv.org
The discrepancy between the cost function used for training a speech enhancement model
and human auditory perception usually makes the quality of enhanced speech …

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain

K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE international …, 2021 - ieeexplore.ieee.org
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …

Phasen: A phase-and-harmonics-aware speech enhancement network

D Yin, C Luo, Z **ong, W Zeng - Proceedings of the AAAI conference on …, 2020 - ojs.aaai.org
Time-frequency (TF) domain masking is a mainstream approach for single-channel speech
enhancement. Recently, focuses have been put to phase prediction in addition to amplitude …

SEGAN: Speech enhancement generative adversarial network

S Pascual, A Bonafonte, J Serra - arxiv preprint arxiv:1703.09452, 2017 - arxiv.org
Current speech enhancement techniques operate on the spectral domain and/or exploit
some higher-level feature. The majority of them tackle a limited number of noise conditions …

Metricgan: Generative adversarial networks based black-box metric scores optimization for speech enhancement

SW Fu, CF Liao, Y Tsao, SD Lin - … Conference on Machine …, 2019 - proceedings.mlr.press
Adversarial loss in a conditional generative adversarial network (GAN) is not designed to
directly optimize evaluation metrics of a target task, and thus, may not always guide the …

StoRM: A diffusion-based stochastic regeneration model for speech enhancement and dereverberation

JM Lemercier, J Richter, S Welker… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
Diffusion models have shown a great ability at bridging the performance gap between
predictive and generative approaches for speech enhancement. We have shown that they …