Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis
P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …
natural language processing and computer vision. They have achieved great success in …
Wham!: Extending speech separation to noisy environments
G Wichern, J Antognini, M Flynn, LR Zhu… - ar** speakers using
a single audio channel has brought us closer to solving the cocktail party problem. However …
a single audio channel has brought us closer to solving the cocktail party problem. However …
Poconet: Better speech enhancement with frequency-positional embeddings, semi-supervised conversational data, and biased loss
Neural network applications generally benefit from larger-sized models, but for current
speech enhancement models, larger scale networks often suffer from decreased robustness …
speech enhancement models, larger scale networks often suffer from decreased robustness …
DeepFilterNet: A low complexity speech enhancement framework for full-band audio based on deep filtering
H Schroter, AN Escalante-B… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Complex-valued processing has brought deep learning-based speech enhancement and
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …
Time domain audio visual speech separation
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech
related tasks, such as speech recognition and speech enhancement. This paper introduces …
related tasks, such as speech recognition and speech enhancement. This paper introduces …
Differentiable consistency constraints for improved deep speech enhancement
In recent years, deep networks have led to dramatic improvements in speech enhancement
by framing it as a data-driven pattern recognition problem. In many modern enhancement …
by framing it as a data-driven pattern recognition problem. In many modern enhancement …
Deep learning based phase reconstruction for speaker separation: A trigonometric perspective
This study investigates phase reconstruction for deep learning based monaural talker-
independent speaker separation in the short-time Fourier transform (STFT) domain. The key …
independent speaker separation in the short-time Fourier transform (STFT) domain. The key …
Two-step sound source separation: Training on learned latent targets
In this paper, we propose a two-step training procedure for source separation via a deep
neural network. In the first step we learn a transform (and it's inverse) to a latent space where …
neural network. In the first step we learn a transform (and it's inverse) to a latent space where …
End-to-end music source separation: Is it possible in the waveform domain?
Most of the currently successful source separation techniques use the magnitude
spectrogram as input, and are therefore by default omitting part of the signal: the phase. To …
spectrogram as input, and are therefore by default omitting part of the signal: the phase. To …
End-to-end multi-channel speech separation
The end-to-end approach for single-channel speech separation has been studied recently
and shown promising results. This paper extended the previous approach and proposed a …
and shown promising results. This paper extended the previous approach and proposed a …