Listen, denoise, action! audio-driven motion synthesis with diffusion models

S Alexanderson, R Nagy, J Beskow… - ACM Transactions on …, 2023 - dl.acm.org
Diffusion models have experienced a surge of interest as highly expressive yet efficiently
trainable probabilistic models. We show that these models are an excellent fit for …

[PDF][PDF] Acoustic modeling in statistical parametric speech synthesis-from HMM to LSTM-RNN

H Zen - Proc. MLSLP, 2015 - research.google.com
Statistical parametric speech synthesis (SPSS) combines an acoustic model and a vocoder
to render speech given a text. Typically decision tree-clustered context-dependent hidden …

Speech synthesis based on hidden Markov models

K Tokuda, Y Nankaku, T Toda, H Zen… - Proceedings of the …, 2013 - ieeexplore.ieee.org
This paper gives a general overview of hidden Markov model (HMM)-based speech
synthesis, which has recently been demonstrated to be very effective in synthesizing …

Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis

ZH Ling, L Deng, D Yu - IEEE transactions on audio, speech …, 2013 - ieeexplore.ieee.org
This paper presents a new spectral modeling method for statistical parametric speech
synthesis. In the conventional methods, high-level spectral parameters, such as mel-cepstra …

Autoregressive models for statistical parametric speech synthesis

M Shannon, H Zen, W Byrne - IEEE transactions on audio …, 2012 - ieeexplore.ieee.org
We propose using the autoregressive hidden Markov model (HMM) for speech synthesis.
The autoregressive HMM uses the same model for parameter estimation and synthesis in a …

Autoregressive neural f0 model for statistical parametric speech synthesis

X Wang, S Takaki, J Yamagishi - IEEE/ACM Transactions on …, 2018 - ieeexplore.ieee.org
Recurrent neural networks (RNNs) have been successfully used as fundamental frequency
(F0) models for text-to-speech synthesis. However, this paper showed that a normal RNN …

Statistical parametric speech synthesis based on Gaussian process regression

T Koriyama, T Nose, T Kobayashi - IEEE Journal of Selected …, 2013 - ieeexplore.ieee.org
This paper proposes a statistical parametric speech synthesis technique based on Gaussian
process regression (GPR). The GPR model is designed for directly predicting frame-level …

Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech

GE Henter, T Merritt, M Shannon… - … 2014 15th Annual …, 2014 - research.ed.ac.uk
Acoustic models used for statistical parametric speech synthesis typically incorporate many
modelling assumptions. It is an open question to what extent these assumptions limit the …

Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE

B Uria, I Murray, S Renals… - … , Speech and Signal …, 2015 - ieeexplore.ieee.org
Given a transcription, sampling from a good model of acoustic feature trajectories should
result in plausible realizations of an utterance. However, samples from current probabilistic …

Sampling-based speech parameter generation using moment-matching networks

S Takamichi, T Koriyama, H Saruwatari - arxiv preprint arxiv:1704.03626, 2017 - arxiv.org
This paper presents sampling-based speech parameter generation using moment-matching
networks for Deep Neural Network (DNN)-based speech synthesis. Although people never …