An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …
conversion, we change the speaker identity from one to another, while kee** the linguistic …
An overview of noise-robust automatic speech recognition
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …
with mobile devices and home entertainment systems, increasingly require automatic …
TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain
This work proposes a fully convolutional neural network (CNN) for real-time speech
enhancement in the time domain. The proposed CNN is an encoder-decoder based …
enhancement in the time domain. The proposed CNN is an encoder-decoder based …
On training targets for supervised speech separation
Formulation of speech separation as a supervised learning problem has shown
considerable promise. In its simplest form, a supervised learning algorithm, typically a deep …
considerable promise. In its simplest form, a supervised learning algorithm, typically a deep …
Multiple-target deep learning for LSTM-RNN based speech enhancement
In this study, we explore long short-term memory recurrent neural networks (LSTM-RNNs)
for speech enhancement. First, a regression LSTM-RNN approach for a direct map** from …
for speech enhancement. First, a regression LSTM-RNN approach for a direct map** from …
Interactive speech and noise modeling for speech enhancement
Speech enhancement is challenging because of the diversity of background noise types.
Most of the existing methods are focused on modelling the speech rather than the noise. In …
Most of the existing methods are focused on modelling the speech rather than the noise. In …
Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …
60 years, and a great number of methods have been proposed and applied to many …
Static and dynamic source separation using nonnegative factorizations: A unified view
Source separation models that make use of nonnegativity in their parameters have been
gaining increasing popularity in the last few years, spawning a significant number of …
gaining increasing popularity in the last few years, spawning a significant number of …
A recurrent variational autoencoder for speech enhancement
This paper presents a generative approach to speech enhancement based on a recurrent
variational autoencoder (RVAE). The deep generative speech model is trained using clean …
variational autoencoder (RVAE). The deep generative speech model is trained using clean …
Unsupervised speech enhancement using dynamical variational autoencoders
Dynamical variational autoencoders (DVAEs) are a class of deep generative models with
latent variables, dedicated to model time series of high-dimensional data. DVAEs can be …
latent variables, dedicated to model time series of high-dimensional data. DVAEs can be …