[PDF][PDF] Text to speech synthesis: a systematic review, deep learning based architecture and future research direction

F Khanam, FA Munmun, NA Ritu, AK Saha… - Journal of Advances in …, 2022 - academia.edu
Text to Speech (TTS) synthesis is a process of translating natural language text into speech.
Pieces of recorded speech generate synthesized speech and a database is maintained for …

Acoustic features modelling for statistical parametric speech synthesis: a review

N Adiga, SRM Prasanna - IETE Technical Review, 2019 - Taylor & Francis
The objective of this paper is to present a detailed review of modelling various acoustic
features employed in statistical parametric speech synthesis (SPSS). As reported in the …

Glotnet—a raw waveform model for the glottal excitation in statistical parametric speech synthesis

L Juvela, B Bollepalli, V Tsiaras… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
Recently, generative neural network models which operate directly on raw audio, such as
WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover …

Fusion of spectral and prosody modelling for multilingual speech emotion conversion

S Vekkot, D Gupta - Knowledge-Based Systems, 2022 - Elsevier
The paper proposes an integrated speech emotion conversion framework developed using
speaker-independent mixed-lingual training. The key contribution of the work is non-parallel …

A comparison between straight, glottal, and sinusoidal vocoding in statistical parametric speech synthesis

M Airaksinen, L Juvela, B Bollepalli… - … on Audio, Speech …, 2018 - ieeexplore.ieee.org
A vocoder is used to express a speech waveform with a controllable parametric
representation that can be converted back into a speech waveform. Vocoders representing …

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

B Bollepalli, L Juvela, P Alku - arxiv preprint arxiv:1903.05955, 2019 - arxiv.org
Recent studies have shown that text-to-speech synthesis quality can be improved by using
glottal vocoding. This refers to vocoders that parameterize speech into two parts, the glottal …

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

S Takaki, J Yamagishi - 2016 IEEE International Conference on …, 2016 - ieeexplore.ieee.org
In the state-of-the-art statistical parametric speech synthesis system, a speech analysis
module, eg STRAIGHT spectral analysis, is generally used for obtaining accurate and stable …

GlottDNN-A full-band glottal vocoder for statistical parametric speech synthesis

M Airaksinen, B Bollepalli, L Juvela, Z Wu… - Interspeech …, 2016 - research.ed.ac.uk
GlottHMM is a previously developed vocoder that has been successfully used in HMM-
based synthesis by parameterizing speech into two parts (glottal flow, vocal tract) according …

High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network

L Juvela, B Bollepalli, M Airaksinen… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Achieving high quality and naturalness in statistical parametric synthesis of female voices
remains to be difficult despite recent advances in the study area. Vocoding is one such key …

Emotional voice conversion using a hybrid framework with speaker-adaptive DNN and particle-swarm-optimized neural network

S Vekkot, D Gupta, M Zakariah, YA Alotaibi - IEEE Access, 2020 - ieeexplore.ieee.org
We propose a hybrid network-based learning framework for speaker-adaptive vocal emotion
conversion, tested on three different datasets (languages), namely, EmoDB (German) …