Listen, denoise, action! audio-driven motion synthesis with diffusion models
Diffusion models have experienced a surge of interest as highly expressive yet efficiently
trainable probabilistic models. We show that these models are an excellent fit for …
trainable probabilistic models. We show that these models are an excellent fit for …
[PDF][PDF] Acoustic modeling in statistical parametric speech synthesis-from HMM to LSTM-RNN
H Zen - Proc. MLSLP, 2015 - research.google.com
Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder
to render speech given a text. Typically decision tree-clustered context-dependent hidden …
to render speech given a text. Typically decision tree-clustered context-dependent hidden …
Speech synthesis based on hidden Markov models
This paper gives a general overview of hidden Markov model (HMM)-based speech
synthesis, which has recently been demonstrated to be very effective in synthesizing …
synthesis, which has recently been demonstrated to be very effective in synthesizing …
Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
This paper presents a new spectral modeling method for statistical parametric speech
synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra …
synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra …
Autoregressive models for statistical parametric speech synthesis
We propose using the autoregressive hidden Markov model (HMM) for speech synthesis.
The autoregressive HMM uses the same model for parameter estimation and synthesis in a …
The autoregressive HMM uses the same model for parameter estimation and synthesis in a …
Autoregressive neural f0 model for statistical parametric speech synthesis
Recurrent neural networks (RNNs) have been successfully used as fundamental frequency
(F0) models for text-to-speech synthesis. However, this paper showed that a normal RNN …
(F0) models for text-to-speech synthesis. However, this paper showed that a normal RNN …
Statistical parametric speech synthesis based on Gaussian process regression
This paper proposes a statistical parametric speech synthesis technique based on Gaussian
process regression (GPR). The GPR model is designed for directly predicting frame-level …
process regression (GPR). The GPR model is designed for directly predicting frame-level …
Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
Acoustic models used for statistical parametric speech synthesis typically incorporate many
modelling assumptions. It is an open question to what extent these assumptions limit the …
modelling assumptions. It is an open question to what extent these assumptions limit the …
Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE
Given a transcription, sampling from a good model of acoustic feature trajectories should
result in plausible realizations of an utterance. However, samples from current probabilistic …
result in plausible realizations of an utterance. However, samples from current probabilistic …
Sampling-based speech parameter generation using moment-matching networks
This paper presents sampling-based speech parameter generation using moment-matching
networks for Deep Neural Network (DNN)-based speech synthesis. Although people never …
networks for Deep Neural Network (DNN)-based speech synthesis. Although people never …