Fundamentals, present and future perspectives of speech enhancement

N Das, S Chakraborty, J Chaki, N Padhy… - International Journal of …, 2021 - Springer
Speech enhancement has substantial interest in the utilization of speaker identification,
video-conference, speech transmission through communication channels, speech-based …

Mamba in speech: Towards an alternative to self-attention

X Zhang, Q Zhang, H Liu, T **ao, X Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer and its derivatives have achieved success in diverse tasks across computer
vision, natural language processing, and speech processing. To reduce the complexity of …

DeepMMSE: A deep learning approach to MMSE-based noise power spectral density estimation

Q Zhang, A Nicolson, M Wang… - … /ACM Transactions on …, 2020 - ieeexplore.ieee.org
An accurate noise power spectral density (PSD) tracker is an indispensable component of a
single-channel speech enhancement system. Bayesian-motivated minimum mean-square …

Deep learning for minimum mean-square error approaches to speech enhancement

A Nicolson, KK Paliwal - Speech Communication, 2019 - Elsevier
Recently, the focus of speech enhancement research has shifted from minimum mean-
square error (MMSE) approaches, like the MMSE short-time spectral amplitude (MMSE …

A time-frequency attention module for neural speech enhancement

Q Zhang, X Qian, Z Ni, A Nicolson… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
Speech enhancement plays an essential role in a wide range of speech processing
applications. Recent studies on speech enhancement tend to investigate how to effectively …

A convolutional neural network smartphone app for real-time voice activity detection

A Sehgal, N Kehtarnavaz - IEEE access, 2018 - ieeexplore.ieee.org
This paper presents a smartphone app that performs real-time voice activity detection based
on convolutional neural network. Real-time implementation issues are discussed showing …

An empirical study on the impact of positional encoding in transformer-based monaural speech enhancement

Q Zhang, M Ge, H Zhu, E Ambikairajah… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Transformer architecture has enabled recent progress in speech enhancement. Since
Transformers are position-agostic, positional encoding is the de facto standard component …

Time-frequency attention for monaural speech enhancement

Q Zhang, Q Song, Z Ni, A Nicolson… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Most studies on speech enhancement generally don't explicitly consider the energy
distribution of speech in time-frequency (TF) representation, which is important for accurate …

[PDF][PDF] Temporal convolutional network with frequency dimension adaptive attention for speech enhancement

Q Zhang, Q Song, A Nicolson, T Lan… - Proc. Interspeech …, 2021 - drive.google.com
Despite much progress, most temporal convolutional networks (TCN) based speech
enhancement models are mainly focused on modeling the long-term temporal contextual …

Neural-free attention for monaural speech enhancement toward voice user interface for consumer electronics

M Chen, Q Zhang, Q Song, X Qian… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
The traditional graphic user interface in healthcare-oriented consumer electronics faced
challenges such as high operational complexity, time-consuming operations, and a high risk …