An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

An overview of noise-robust automatic speech recognition

J Li, L Deng, Y Gong… - IEEE/ACM Transactions …, 2014 - ieeexplore.ieee.org
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …

TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain

A Pandey, DL Wang - ICASSP 2019-2019 IEEE International …, 2019 - ieeexplore.ieee.org
This work proposes a fully convolutional neural network (CNN) for real-time speech
enhancement in the time domain. The proposed CNN is an encoder-decoder based …

On training targets for supervised speech separation

Y Wang, A Narayanan, DL Wang - IEEE/ACM transactions on …, 2014 - ieeexplore.ieee.org
Formulation of speech separation as a supervised learning problem has shown
considerable promise. In its simplest form, a supervised learning algorithm, typically a deep …

Multiple-target deep learning for LSTM-RNN based speech enhancement

L Sun, J Du, LR Dai, CH Lee - 2017 Hands-free Speech …, 2017 - ieeexplore.ieee.org
In this study, we explore long short-term memory recurrent neural networks (LSTM-RNNs)
for speech enhancement. First, a regression LSTM-RNN approach for a direct map** from …

Interactive speech and noise modeling for speech enhancement

C Zheng, X Peng, Y Zhang, S Srinivasan… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Speech enhancement is challenging because of the diversity of background noise types.
Most of the existing methods are focused on modelling the speech rather than the noise. In …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

Static and dynamic source separation using nonnegative factorizations: A unified view

P Smaragdis, C Fevotte, GJ Mysore… - IEEE Signal …, 2014 - ieeexplore.ieee.org
Source separation models that make use of nonnegativity in their parameters have been
gaining increasing popularity in the last few years, spawning a significant number of …

A recurrent variational autoencoder for speech enhancement

S Leglaive, X Alameda-Pineda, L Girin… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper presents a generative approach to speech enhancement based on a recurrent
variational autoencoder (RVAE). The deep generative speech model is trained using clean …

Unsupervised speech enhancement using dynamical variational autoencoders

X Bie, S Leglaive, X Alameda-Pineda… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with
latent variables, dedicated to model time series of high-dimensional data. DVAEs can be …