A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

SpeechBrain: A general-purpose speech toolkit

M Ravanelli, T Parcollet, P Plantinga, A Rouhe… - arxiv preprint arxiv …, 2021 - arxiv.org
SpeechBrain is an open-source and all-in-one speech toolkit. It is designed to facilitate the
research and development of neural speech processing technologies by being simple …

A survey on multi-task learning

Y Zhang, Q Yang - IEEE transactions on knowledge and data …, 2021 - ieeexplore.ieee.org
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its aim is to
leverage useful information contained in multiple related tasks to help improve the …

Light gated recurrent units for speech recognition

M Ravanelli, P Brakel, M Omologo… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
A field that has directly benefited from the recent advances in deep learning is automatic
speech recognition (ASR). Despite the great achievements of the past decades, however, a …

Deep attractor network for single-microphone speaker separation

Z Chen, Y Luo, N Mesgarani - 2017 IEEE international …, 2017 - ieeexplore.ieee.org
Despite the overwhelming success of deep learning in various speech processing tasks, the
problem of separating simultaneous speakers in a mixture remains challenging. Two major …

Cold diffusion for speech enhancement

H Yen, FG Germain, G Wichern… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Diffusion models have recently shown promising results for difficult enhancement tasks such
as the conditional and unconditional restoration of natural images and audio signals. In this …

End-to-end waveform utterance enhancement for direct evaluation metrics optimization by fully convolutional neural networks

SW Fu, TW Wang, Y Tsao, X Lu… - IEEE/ACM Transactions …, 2018 - ieeexplore.ieee.org
Speech enhancement model is used to map a noisy speech to a clean speech. In the
training stage, an objective function is often adopted to optimize the model parameters …

Speaker-independent speech separation with deep attractor network

Y Luo, Z Chen, N Mesgarani - IEEE/ACM Transactions on …, 2018 - ieeexplore.ieee.org
Despite the recent success of deep learning for many speech processing tasks, single-
microphone, speaker-independent speech separation remains challenging for two main …

Multichannel signal processing with deep neural networks for automatic speech recognition

TN Sainath, RJ Weiss, KW Wilson, B Li… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Multichannel automatic speech recognition (ASR) systems commonly separate speech
enhancement, including localization, beamforming, and postfiltering, from acoustic …