Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - arxiv preprint arxiv:2006.12847, 2020 - arxiv.org
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …

Music source separation with band-split RNN

Y Luo, J Yu - IEEE/ACM Transactions on Audio, Speech, and …, 2023 - ieeexplore.ieee.org
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …

TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain

K Wang, B He, WP Zhu - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …

Separate anything you describe

X Liu, Q Kong, Y Zhao, H Liu, Y Yuan… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …

CMGAN: Conformer-based metric GAN for speech enhancement

R Cao, S Abdulatif, B Yang - arxiv preprint arxiv:2203.15149, 2022 - arxiv.org
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …

HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks

J Su, Z **, A Finkelstein - arxiv preprint arxiv:2006.05694, 2020 - arxiv.org
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …

[PDF][PDF] SE-Conformer: Time-Domain Speech Enhancement Using Conformer.

E Kim, H Seo - Interspeech, 2021 - isca-archive.org
Convolution-augmented transformer (conformer) has recently shown competitive results in
speech-domain applications, such as automatic speech recognition, continuous speech …

Poconet: Better speech enhancement with frequency-positional embeddings, semi-supervised conversational data, and biased loss

U Isik, R Giri, N Phansalkar, JM Valin… - arxiv preprint arxiv …, 2020 - arxiv.org
Neural network applications generally benefit from larger-sized models, but for current
speech enhancement models, larger scale networks often suffer from decreased robustness …

Attention wave-u-net for speech enhancement

R Giri, U Isik, A Krishnaswamy - 2019 IEEE Workshop on …, 2019 - ieeexplore.ieee.org
We propose a novel application of an attention mechanism in neural speech enhancement,
by presenting a U-Net architecture with attention mechanism, which processes the raw …