Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

P Ochieng - Artificial Intelligence Review, 2023 - Springer
Deep neural networks (DNN) techniques have become pervasive in domains such as
natural language processing and computer vision. They have achieved great success in …

Poconet: Better speech enhancement with frequency-positional embeddings, semi-supervised conversational data, and biased loss

U Isik, R Giri, N Phansalkar, JM Valin… - arxiv preprint arxiv …, 2020 - arxiv.org
Neural network applications generally benefit from larger-sized models, but for current
speech enhancement models, larger scale networks often suffer from decreased robustness …

DeepFilterNet: A low complexity speech enhancement framework for full-band audio based on deep filtering

H Schroter, AN Escalante-B… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Complex-valued processing has brought deep learning-based speech enhancement and
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …

Time domain audio visual speech separation

J Wu, Y Xu, SX Zhang, LW Chen, M Yu… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech
related tasks, such as speech recognition and speech enhancement. This paper introduces …

Differentiable consistency constraints for improved deep speech enhancement

S Wisdom, JR Hershey, K Wilson… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
In recent years, deep networks have led to dramatic improvements in speech enhancement
by framing it as a data-driven pattern recognition problem. In many modern enhancement …

Deep learning based phase reconstruction for speaker separation: A trigonometric perspective

ZQ Wang, K Tan, DL Wang - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This study investigates phase reconstruction for deep learning based monaural talker-
independent speaker separation in the short-time Fourier transform (STFT) domain. The key …

Two-step sound source separation: Training on learned latent targets

E Tzinis, S Venkataramani, Z Wang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
In this paper, we propose a two-step training procedure for source separation via a deep
neural network. In the first step we learn a transform (and it's inverse) to a latent space where …

End-to-end music source separation: Is it possible in the waveform domain?

F Lluís, J Pons, X Serra - arxiv preprint arxiv:1810.12187, 2018 - arxiv.org
Most of the currently successful source separation techniques use the magnitude
spectrogram as input, and are therefore by default omitting part of the signal: the phase. To …

End-to-end multi-channel speech separation

R Gu, J Wu, SX Zhang, L Chen, Y Xu, M Yu… - arxiv preprint arxiv …, 2019 - arxiv.org
The end-to-end approach for single-channel speech separation has been studied recently
and shown promising results. This paper extended the previous approach and proposed a …