Survey of deep learning paradigms for speech processing
KB Bhangale, M Kothandaraman - Wireless Personal Communications, 2022 - Springer
Over the past decades, a particular focus is given to research on machine learning
techniques for speech processing applications. However, in the past few years, research …
techniques for speech processing applications. However, in the past few years, research …
Deep learning for environmentally robust speech recognition: An overview of recent developments
Eliminating the negative effect of non-stationary environmental noise is a long-standing
research topic for automatic speech recognition but still remains an important challenge …
research topic for automatic speech recognition but still remains an important challenge …
TF-GridNet: Integrating full-and sub-band modeling for speech separation
We propose TF-GridNet for speech separation. The model is a novel deep neural network
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
(DNN) integrating full-and sub-band modeling in the time-frequency (TF) domain. It stacks …
Attentive statistics pooling for deep speaker embedding
This paper proposes attentive statistics pooling for deep speaker embedding in text-
independent speaker verification. In conventional speaker embedding, frame-level features …
independent speaker verification. In conventional speaker embedding, frame-level features …
Soundspaces 2.0: A simulation platform for visual-acoustic learning
Abstract We introduce SoundSpaces 2.0, a platform for on-the-fly geometry-based audio
rendering for 3D environments. Given a 3D mesh of a real-world environment …
rendering for 3D environments. Given a 3D mesh of a real-world environment …
Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …
CMGAN: Conformer-based metric GAN for speech enhancement
Recently, convolution-augmented transformer (Conformer) has achieved promising
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
performance in automatic speech recognition (ASR) and time-domain speech enhancement …
Detection and classification of acoustic scenes and events: Outcome of the DCASE 2016 challenge
Public evaluation campaigns and datasets promote active development in target research
areas, allowing direct comparison of algorithms. The second edition of the challenge on …
areas, allowing direct comparison of algorithms. The second edition of the challenge on …
HiFi-GAN: High-fidelity denoising and dereverberation based on speech deep features in adversarial networks
Real-world audio recordings are often degraded by factors such as noise, reverberation,
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to …
M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge
Recent development of speech signal processing, such as speech recognition, speaker
diarization, etc., has inspired numerous applications of speech technologies. The meeting …
diarization, etc., has inspired numerous applications of speech technologies. The meeting …