[HTML][HTML] A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C **ng, LJ Zhang - Applied Sciences, 2019 - mdpi.com
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

End-to-end text-to-speech for low-resource languages by cross-lingual transfer learning

T Tu, YJ Chen, C Yeh, HY Lee - arxiv preprint arxiv:1904.06508, 2019 - arxiv.org
End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text
plus speech data. However, laborious data collection remains difficult for at least 95% of the …

A systematic review and analysis of multilingual data strategies in text-to-speech for low-resource languages

P Do, M Coler, J Dijkstra, E Klabbers - Interspeech 2021, 2021 - research.rug.nl
We provide a systematic review of past studies that use multilingual data for text-to-speech
(TTS) of low-resource languages (LRLs). We focus on the strategies used by these studies …

[PDF][PDF] Multi-Language Multi-Speaker Acoustic Modeling for LSTM-RNN Based Statistical Parametric Speech Synthesis.

B Li, H Zen - Interspeech, 2016 - isca-archive.org
Building text-to-speech (TTS) systems requires large amounts of high quality speech
recordings and annotations, which is a challenge to collect especially considering the …

Accented text-to-speech synthesis with limited data

X Zhou, M Zhang, Y Zhou, Z Wu… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
This paper presents an accented text-to-speech (TTS) synthesis framework with limited
training data. We study two aspects concerning accent rendering: phonetic (phoneme …

Unsupervised learning for sequence-to-sequence text-to-speech for low-resource languages

H Zhang, Y Lin - arxiv preprint arxiv:2008.04549, 2020 - arxiv.org
Recently, sequence-to-sequence models with attention have been successfully applied in
Text-to-speech (TTS). These models can generate near-human speech with a large …

Exploring the role of language families for building indic speech synthesisers

A Prakash, HA Murthy - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Building end-to-end speech synthesisers for Indian languages is challenging, given the lack
of adequate clean training data and multiple grapheme representations across languages …

[PDF][PDF] Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.

T Wang, J Tao, R Fu, J Yi, Z Wen, R Zhong - Interspeech, 2020 - researchgate.net
The low similarity and naturalness of synthesized speech remain a challenging problem for
speaker adaptation with few resources. Since the acoustic model is too complex to interpret …

Discovering phonetic inventories with crosslingual automatic speech recognition

P Żelasko, S Feng, LM Velazquez, A Abavisani… - Computer Speech & …, 2022 - Elsevier
The high cost of data acquisition makes Automatic Speech Recognition (ASR) model
training problematic for most existing languages, including languages that do not even have …

Prosody and voice factorization for few-shot speaker adaptation in the challenge m2voc 2021

T Wang, R Fu, J Yi, J Tao, Z Wen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The paper describes the CASIA speech synthesis system entry for challenge M2VoC 2021.
The low similarity and naturalness of synthesized speech remains a challenging problem for …