Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

[HTML][HTML] A survey of sound source localization with deep learning methods

PA Grumiaux, S Kitić, L Girin, A Guérin - The Journal of the Acoustical …, 2022 - pubs.aip.org
This article is a survey of deep learning methods for single and multiple sound source
localization, with a focus on sound source localization in indoor environments, where …

End-to-end microphone permutation and number invariant multi-channel speech separation

Y Luo, Z Chen, N Mesgarani… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
An important problem in ad-hoc microphone speech separation is how to guarantee the
robustness of a system with respect to the locations and numbers of microphones. The …

Internal language model estimation for domain-adaptive end-to-end speech recognition

Z Meng, S Parthasarathy, E Sun, Y Gaur… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
The external language models (LM) integration remains a challenging task for end-to-end
(E2E) automatic speech recognition (ASR) which has no clear division between acoustic …

FaSNet: Low-latency adaptive beamforming for multi-microphone audio processing

Y Luo, C Han, N Mesgarani, E Ceolini… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org
Beamforming has been extensively investigated for multi-channel audio processing tasks.
Recently, learning-based beamforming methods, sometimes called neural beamformers …

Neural spectrospatial filtering

K Tan, ZQ Wang, DL Wang - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
As the most widely-used spatial filtering approach for multi-channel speech separation,
beamforming extracts the target speech signal arriving from a specific direction. An …

Speaker-invariant training via adversarial learning

Z Meng, J Li, Z Chen, Y Zhao, V Mazalov… - … , Speech and Signal …, 2018 - ieeexplore.ieee.org
We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the
inter-talker feature variability while maximizing its senone discriminability so as to enhance …

Conditional teacher-student learning

Z Meng, J Li, Y Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …

A review of the state of the art and future challenges of deep learning-based beamforming

H Al Kassir, ZD Zaharis, PI Lazaridis… - IEEE …, 2022 - ieeexplore.ieee.org
The key objective of this paper is to explore the recent state-of-the-art artificial intelligence
(AI) applications on the broad field of beamforming. Hence, a multitude of AI-oriented …