An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

Real time speech enhancement in the waveform domain

A Defossez, G Synnaeve, Y Adi - ar** with gated convolutional recurrent networks for monaural speech enhancement
K Tan, DL Wang - IEEE/ACM Transactions on Audio, Speech …, 2019 - ieeexplore.ieee.org
Phase is important for perceptual quality of speech. However, it seems intractable to directly
estimate phase spectra through supervised learning due to their lack of spectrotemporal …

Wham!: Extending speech separation to noisy environments

G Wichern, J Antognini, M Flynn, LR Zhu… - ar** speakers using
a single audio channel has brought us closer to solving the cocktail party problem. However …

Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks

H Erdogan, JR Hershey, S Watanabe… - … on Acoustics, Speech …, 2015 - ieeexplore.ieee.org
Separation of speech embedded in non-stationary interference is a challenging problem that
has recently seen dramatic improvements using deep network-based methods. Previous …

[PDF][PDF] Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.

C Valentini-Botinhao, X Wang, S Takaki, J Yamagishi - SSW, 2016 - isca-archive.org
The quality of text-to-speech (TTS) voices built from noisy speech is compromised.
Enhancing the speech data before training has been shown to improve quality but voices …