Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Dual-path rnn: efficient long sequence modeling for time-domain single-channel speech separation
Recent studies in deep learning-based speech separation have proven the superiority of
time-domain approaches to conventional time-frequency-based methods. Unlike the time …
time-domain approaches to conventional time-frequency-based methods. Unlike the time …
Multi-speaker DOA estimation using deep convolutional networks trained with noise signals
Supervised learning-based methods for source localization, being data driven, can be
adapted to different acoustic conditions via training and have been shown to be robust to …
adapted to different acoustic conditions via training and have been shown to be robust to …
Continuous speech separation: Dataset and analysis
This paper describes a dataset and protocols for evaluating continuous speech separation
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …
algorithms. Most prior speech separation studies use pre-segmented audio signals, which …
Continuous speech separation with conformer
Continuous speech separation was recently proposed to deal with the overlapped speech in
natural conversations. While it was shown to significantly improve the speech recognition …
natural conversations. While it was shown to significantly improve the speech recognition …
DeepFilterNet: A low complexity speech enhancement framework for full-band audio based on deep filtering
Complex-valued processing has brought deep learning-based speech enhancement and
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …
signal extraction to a new level. Typically, the process is based on a time-frequency (TF) …
Multi-channel overlapped speech recognition with location guided speech extraction network
Although advances in close-talk speech recognition have resulted in relatively low error
rates, the recognition performance in far-field environments is still limited due to low signal …
rates, the recognition performance in far-field environments is still limited due to low signal …
New insights into the MVDR beamformer in room acoustics
The minimum variance distortionless response (MVDR) beamformer, also known as
Capon's beamformer, is widely studied in the area of speech enhancement. The MVDR …
Capon's beamformer, is widely studied in the area of speech enhancement. The MVDR …
Generating nonstationary multisensor signals under a spatial coherence constraint
Noise fields encountered in real-life scenarios can often be approximated as spherical or
cylindrical noise fields. The characteristics of the noise field can be described by a spatial …
cylindrical noise fields. The characteristics of the noise field can be described by a spatial …
Time–frequency masking based online multi-channel speech enhancement with convolutional recurrent neural networks
This paper presents a time-frequency masking based online multi-channel speech
enhancement approach that uses a convolutional recurrent neural network to estimate the …
enhancement approach that uses a convolutional recurrent neural network to estimate the …