[PDF][PDF] Text to speech synthesis: a systematic review, deep learning based architecture and future research direction
Text to Speech (TTS) synthesis is a process of translating natural language text into speech.
Pieces of recorded speech generate synthesized speech and a database is maintained for …
Pieces of recorded speech generate synthesized speech and a database is maintained for …
Acoustic features modelling for statistical parametric speech synthesis: a review
The objective of this paper is to present a detailed review of modelling various acoustic
features employed in statistical parametric speech synthesis (SPSS). As reported in the …
features employed in statistical parametric speech synthesis (SPSS). As reported in the …
Glotnet—a raw waveform model for the glottal excitation in statistical parametric speech synthesis
Recently, generative neural network models which operate directly on raw audio, such as
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …
Fusion of spectral and prosody modelling for multilingual speech emotion conversion
S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier
The paper proposes an integrated speech emotion conversion framework developed using
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …
A comparison between straight, glottal, and sinusoidal vocoding in statistical parametric speech synthesis
A vocoder is used to express a speech waveform with a controllable parametric
representation that can be converted back into a speech waveform. Vocoders representing …
representation that can be converted back into a speech waveform. Vocoders representing …
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis
Recent studies have shown that text-to-speech synthesis quality can be improved by using
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …
A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
S Takaki, J Yamagishi - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
In the state-of-the-art statistical parametric speech synthesis system, a speech analysis
module, eg STRAIGHT spectral analysis, is generally used for obtaining accurate and stable …
module, eg STRAIGHT spectral analysis, is generally used for obtaining accurate and stable …
GlottDNN-A full-band glottal vocoder for statistical parametric speech synthesis
GlottHMM is a previously developed vocoder that has been successfully used in HMM-
based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according …
based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according …
High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
Achieving high quality and naturalness in statistical parametric synthesis of female voices
remains to be difficult despite recent advances in the study area. Vocoding is one such key …
remains to be difficult despite recent advances in the study area. Vocoding is one such key …
Emotional voice conversion using a hybrid framework with speaker-adaptive DNN and particle-swarm-optimized neural network
We propose a hybrid network-based learning framework for speaker-adaptive vocal emotion
conversion, tested on three different datasets (languages), namely, EmoDB (German) …
conversion, tested on three different datasets (languages), namely, EmoDB (German) …