A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Two heads are better than one: A two-stage complex spectral map** approach for monaural speech enhancement

A Li, W Liu, C Zheng, C Fan, X Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
For challenging acoustic scenarios as low signal-to-noise ratios, current speech
enhancement systems usually suffer from performance bottleneck in extracting the target …

Fullsubnet: A full-band and sub-band fusion model for real-time single-channel speech enhancement

X Hao, X Su, R Horaud, X Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
This paper proposes a full-band and sub-band fusion model, named as FullSubNet, for
single-channel real-time speech enhancement. Full-band and sub-band refer to the models …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

Fullsubnet+: Channel attention fullsubnet with complex spectrograms for speech enhancement

J Chen, Z Wang, D Tuo, Z Wu, S Kang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Previously proposed FullSubNet has achieved outstanding performance in Deep Noise
Suppression (DNS) Challenge and attracted much attention. However, it still encounters …

Glance and gaze: A collaborative learning framework for single-channel speech enhancement

A Li, C Zheng, L Zhang, X Li - Applied Acoustics, 2022 - Elsevier
The capability of the human to pay attention to both coarse and fine-grained regions has
been applied to computer vision tasks. Motivated by that, we propose a collaborative …

Dual-branch attention-in-attention transformer for single-channel speech enhancement

G Yu, A Li, C Zheng, Y Guo, Y Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Curriculum learning begins to thrive in the speech enhancement area, which decouples the
original spectrum estimation task into multiple easier sub-tasks to achieve better …