Supervised speech separation based on deep learning: An overview
Speech separation is the task of separating target speech from background interference.
Traditionally, speech separation is studied as a signal processing problem. A more recent …
Traditionally, speech separation is studied as a signal processing problem. A more recent …
Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
Complex spectral map** for single-and multi-channel speech enhancement and robust ASR
This study proposes a complex spectral map** approach for single-and multi-channel
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …
speech enhancement, where deep neural networks (DNNs) are used to predict the real and …
[PDF][PDF] Improved MVDR beamforming using single-channel mask prediction networks.
Recent studies on multi-microphone speech databases indicate that it is beneficial to
perform beamforming to improve speech recognition accuracies, especially when there is a …
perform beamforming to improve speech recognition accuracies, especially when there is a …
Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation
The recently-proposed deep clustering algorithm represents a fundamental advance
towards solving the cocktail party problem in the single-channel case. When multiple …
towards solving the cocktail party problem in the single-channel case. When multiple …
Far-field automatic speech recognition
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …
far-field automatic speech recognition (ASR), has received a significant increase in attention …
Neural spectrospatial filtering
As the most widely-used spatial filtering approach for multi-channel speech separation,
beamforming extracts the target speech signal arriving from a specific direction. An …
beamforming extracts the target speech signal arriving from a specific direction. An …
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices
CHiME-3 is a research community challenge organised in 2015 to evaluate speech
recognition systems for mobile multi-microphone devices used in noisy daily environments …
recognition systems for mobile multi-microphone devices used in noisy daily environments …
The first multimodal information based speech processing (misp) challenge: Data, tasks, baselines and results
In this paper we discuss the rational of the Multi-model Information based Speech
Processing (MISP) Challenge, and provide a detailed description of the data recorded, the …
Processing (MISP) Challenge, and provide a detailed description of the data recorded, the …
A four-stage data augmentation approach to resnet-conformer based acoustic modeling for sound event localization and detection
In this paper, we propose a novel four-stage data augmentation approach to ResNet-
Conformer based acoustic modeling for sound event localization and detection (SELD) …
Conformer based acoustic modeling for sound event localization and detection (SELD) …