Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation

Y Luo, Z Chen, T Yoshioka - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Recent studies in deep learning-based speech separation have proven the superiority of
time-domain approaches to conventional time-frequency-based methods. Unlike the time …

Multi-speaker DOA estimation using deep convolutional networks trained with noise signals

S Chakrabarty, EAP Habets - IEEE Journal of Selected Topics …, 2019 - ieeexplore.ieee.org
Supervised learning-based methods for source localization, being data driven, can be
adapted to different acoustic conditions via training and have been shown to be robust to …

Continuous speech separation: Dataset and analysis

Z Chen, T Yoshioka, L Lu, T Zhou… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …

Continuous speech separation with conformer

S Chen, Y Wu, Z Chen, J Wu, J Li… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …

DeepFilterNet: A low complexity speech enhancement framework for full-band audio based on deep filtering

H Schroter, AN Escalante-B… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Complex-valued processing has brought deep learning-based speech enhancement and
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …

Multi-channel overlapped speech recognition with location guided speech extraction network

Z Chen, X **ao, T Yoshioka, H Erdogan… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Although advances in close-talk speech recognition have resulted in relatively low error
rates, the recognition performance in far-field environments is still limited due to low signal …

New insights into the MVDR beamformer in room acoustics

EAP Habets, J Benesty, I Cohen… - … on Audio, Speech …, 2009 - ieeexplore.ieee.org
The minimum variance distortionless response (MVDR) beamformer, also known as
Capon's beamformer, is widely studied in the area of speech enhancement. The MVDR …

Generating nonstationary multisensor signals under a spatial coherence constraint

EAP Habets, I Cohen, S Gannot - The Journal of the Acoustical Society …, 2008 - pubs.aip.org
Noise fields encountered in real-life scenarios can often be approximated as spherical or
cylindrical noise fields. The characteristics of the noise field can be described by a spatial …

Time–frequency masking based online multi-channel speech enhancement with convolutional recurrent neural networks

S Chakrabarty, EAP Habets - IEEE Journal of Selected Topics …, 2019 - ieeexplore.ieee.org
This paper presents a time-frequency masking based online multi-channel speech
enhancement approach that uses a convolutional recurrent neural network to estimate the …