Past review, current progress, and challenges ahead on the cocktail party problem

Y Qian, C Weng, X Chang, S Wang, D Yu - Frontiers of Information …, 2018 - Springer
The cocktail party problem, ie, tracing and recognizing the speech of a specific speaker
when multiple speakers talk simultaneously, is one of the critical problems yet to be solved …

Recent developments in speech enhancement in the short-time Fourier transform domain

M Parchami, WP Zhu, B Champagne… - IEEE Circuits and …, 2016 - ieeexplore.ieee.org
In this paper, we present an overview on the topic of noise reduction in the short-time Fourier
transform (STFT) domain. First, we briefly review the conventional literature in the single-and …

A consolidated perspective on multimicrophone speech enhancement and source separation

S Gannot, E Vincent… - … /ACM Transactions on …, 2017 - ieeexplore.ieee.org
Speech enhancement and separation are core problems in audio signal processing, with
commercial applications in devices as diverse as mobile phones, conference call systems …

[PDF][PDF] Improved MVDR beamforming using single-channel mask prediction networks.

H Erdogan, JR Hershey, S Watanabe, MI Mandel… - Interspeech, 2016 - isca-archive.org
Recent studies on multi-microphone speech databases indicate that it is beneficial to
perform beamforming to improve speech recognition accuracies, especially when there is a …

Multi-channel deep clustering: Discriminative spectral and spatial embeddings for speaker-independent speech separation

ZQ Wang, J Le Roux, JR Hershey - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
The recently-proposed deep clustering algorithm represents a fundamental advance
towards solving the cocktail party problem in the single-channel case. When multiple …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise

T Higuchi, N Ito, T Yoshioka… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
This paper considers acoustic beamforming for noise robust automatic speech recognition
(ASR). A beamformer attenuates background noise by enhancing sound components …

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition

AS Subramanian, C Weng, S Watanabe, M Yu… - Computer Speech & …, 2022 - Elsevier
Multi-source localization is an important and challenging technique for multi-talker
conversation analysis. This paper proposes a novel supervised learning method using deep …

ESPnet-SE: End-to-end speech enhancement and separation toolkit designed for ASR integration

C Li, J Shi, W Zhang, AS Subramanian… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
We present ESPnet-SE, which is designed for the quick development of speech
enhancement and speech separation systems in a single framework, along with the optional …

[PDF][PDF] Front-end processing for the CHiME-5 dinner party scenario

C Boeddeker, J Heitkaemper… - CHiME5 Workshop …, 2018 - isca-archive.org
This contribution presents a speech enhancement system for the CHiME-5 Dinner Party
Scenario. The front-end employs multi-channel linear time-variant filtering and achieves its …