Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …
60 years, and a great number of methods have been proposed and applied to many …
Real time speech enhancement in the waveform domain
We present a causal speech enhancement model working on the raw waveform that runs in
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …
real-time on a laptop CPU. The proposed model is based on an encoder-decoder …
Music source separation with band-split RNN
The performance of music source separation (MSS) models has been greatly improved in
recent years thanks to the development of novel neural network architectures and training …
recent years thanks to the development of novel neural network architectures and training …
TSTNN: Two-stage transformer based neural network for speech enhancement in the time domain
In this paper, we propose a transformer-based architecture, called two-stage transformer
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …
neural network (TSTNN) for end-to-end speech denoising in the time domain. The proposed …
Separate anything you describe
Language-queried audio source separation (LASS) is a new paradigm for computational
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …
auditory scene analysis (CASA). LASS aims to separate a target sound from an audio …
CMGAN: Conformer-based metric GAN for speech enhancement
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
[PDF][PDF] SE-Conformer: Time-Domain Speech Enhancement Using Conformer.
E Kim, H Seo - Interspeech, 2021 - isca-archive.org
Convolution-augmented transformer (conformer) has recently shown competitive results in
speech-domain applications, such as automatic speech recognition, continuous speech …
speech-domain applications, such as automatic speech recognition, continuous speech …
Poconet: Better speech enhancement with frequency-positional embeddings, semi-supervised conversational data, and biased loss
Neural network applications generally benefit from larger-sized models, but for current
speech enhancement models, larger scale networks often suffer from decreased robustness …
speech enhancement models, larger scale networks often suffer from decreased robustness …
Attention wave-u-net for speech enhancement
We propose a novel application of an attention mechanism in neural speech enhancement,
by presenting a U-Net architecture with attention mechanism, which processes the raw …
by presenting a U-Net architecture with attention mechanism, which processes the raw …