Supervised speech separation based on deep learning: An overview

DL Wang, J Chen - IEEE/ACM transactions on audio, speech …, 2018 - ieeexplore.ieee.org
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …

Deep learning for environmentally robust speech recognition: An overview of recent developments

Z Zhang, J Geiger, J Pohjalainen, AED Mousa… - ACM Transactions on …, 2018 - dl.acm.org
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …

Complex spectral map** for single-and multi-channel speech enhancement and robust ASR

ZQ Wang, P Wang, DL Wang - IEEE/ACM transactions on …, 2020 - ieeexplore.ieee.org
This study proposes a complex spectral map** approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …

[PDF][PDF] Improved MVDR beamforming using single-channel mask prediction networks.

H Erdogan, JR Hershey, S Watanabe, MI Mandel… - Interspeech, 2016 - isca-archive.org
Recent studies on multi-microphone speech databases indicate that it is beneficial to
perform beamforming to improve speech recognition accuracies, especially when there is a …

Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation

ZQ Wang, J Le Roux, JR Hershey - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The recently-proposed deep clustering algorithm represents a fundamental advance
towards solving the cocktail party problem in the single-channel case. When multiple …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Neural spectrospatial filtering

K Tan, ZQ Wang, DL Wang - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
As the most widely-used spatial filtering approach for multi-channel speech separation,
beamforming extracts the target speech signal arriving from a specific direction. An …

The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices

T Yoshioka, N Ito, M Delcroix, A Ogawa… - … IEEE Workshop on …, 2015 - ieeexplore.ieee.org
CHiME-3 is a research community challenge organised in 2015 to evaluate speech
recognition systems for mobile multi-microphone devices used in noisy daily environments …

The first multimodal information based speech processing (misp) challenge: Data, tasks, baselines and results

H Chen, H Zhou, J Du, CH Lee, J Chen… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In this paper we discuss the rational of the Multi-model Information based Speech
Processing (MISP) Challenge, and provide a detailed description of the data recorded, the …

A four-stage data augmentation approach to resnet-conformer based acoustic modeling for sound event localization and detection

Q Wang, J Du, HX Wu, J Pan, F Ma… - IEEE/ACM Transactions …, 2023 - ieeexplore.ieee.org
In this paper, we propose a novel four-stage data augmentation approach to ResNet-
Conformer based acoustic modeling for sound event localization and detection (SELD) …