Statistical parametric speech synthesis

H Zen, K Tokuda, AW Black - speech communication, 2009‏ - Elsevier
This review gives a general overview of techniques used in statistical parametric speech
synthesis. One instance of these techniques, called hidden Markov model (HMM)-based …

Statistical parametric speech synthesis using deep neural networks

H Zen, A Senior, M Schuster - 2013 ieee international …, 2013‏ - ieeexplore.ieee.org
Conventional approaches to statistical parametric speech synthesis typically use decision
tree-clustered context-dependent hidden Markov models (HMMs) to represent probability …

Statistical parametric speech synthesis incorporating generative adversarial networks

Y Saito, S Takamichi… - IEEE/ACM Transactions on …, 2017‏ - ieeexplore.ieee.org
A method for statistical parametric speech synthesis incorporating generative adversarial
networks (GANs) is proposed. Although powerful deep neural networks techniques can be …

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis

H Zen, H Sak - … Conference on Acoustics, Speech and Signal …, 2015‏ - ieeexplore.ieee.org
Long short-term memory recurrent neural networks (LSTM-RNNs) have been applied to
various speech applications including acoustic modeling for statistical parametric speech …

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

H Zen, A Senior - … conference on acoustics, speech and signal …, 2014‏ - ieeexplore.ieee.org
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has
shown its potential to produce naturally-sounding synthesized speech. However, there are …

Prompttts++: Controlling speaker identity in prompt-based text-to-speech using natural language descriptions

R Shimizu, R Yamamoto, M Kawamura… - ICASSP 2024-2024 …, 2024‏ - ieeexplore.ieee.org
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that
allows control over speaker identity using natural language descriptions. To control speaker …

Source-filter HiFi-GAN: Fast and pitch controllable high-fidelity neural vocoder

R Yoneyama, YC Wu, T Toda - ICASSP 2023-2023 IEEE …, 2023‏ - ieeexplore.ieee.org
Our previous work, the unified source-filter GAN (uSFGAN) vocoder, introduced a novel
architecture based on the source-filter theory into the parallel waveform generative …

[PDF][PDF] Harvest: A High-Performance Fundamental Frequency Estimator from Speech Signals.

M Morise - INTERSPEECH, 2017‏ - isca-archive.org
A fundamental frequency (F0) estimator named Harvest is described. The unique points of
Harvest are that it can obtain a reliable F0 contour and reduce the error that the voiced …

[PDF][PDF] Singing Voice Synthesis Based on Deep Neural Networks.

M Nishimura, K Hashimoto, K Oura, Y Nankaku… - Interspeech, 2016‏ - isca-archive.org
Singing voice synthesis techniques have been proposed based on a hidden Markov model
(HMM). In these approaches, the spectrum, excitation, and duration of singing voices are …

A comparative study of different classifiers for detecting depression from spontaneous speech

S Alghowinem, R Goecke, M Wagner… - … on acoustics, speech …, 2013‏ - ieeexplore.ieee.org
Accurate detection of depression from spontaneous speech could lead to an objective
diagnostic aid to assist clinicians to better diagnose depression. Little thought has been …