A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Nautilus: a versatile voice cloning system

HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …

USAT: A Universal Speaker-Adaptive Text-to-Speech Approach

W Wang, Y Song, S Jha - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Conventional text-to-speech (TTS) research has predominantly focused on enhancing the
quality of synthesized speech for speakers in the training dataset. The challenge of …

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

H Guo, F **e, J Kang, Y **ao, X Wu… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS
quality with lower supervised data requirements via Vector-Quantized Self-Supervised …

[PDF][PDF] Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.

D **n, Y Saito, S Takamichi, T Koriyama… - Interspeech, 2021 - isca-archive.org
We present a cross-lingual speaker adaptation method based on domain adaptation and a
speaker consistency loss for text-tospeech (TTS) synthesis. Existing monolingual speaker …

Advancing Accessibility: Voice Cloning and Speech Synthesis for Individuals with Speech Disorders

LD Anand, DJ Reji - arxiv preprint arxiv:2401.11771, 2024 - arxiv.org
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech
using neural networks. One of the most remarkable features of TTS synthesis is its capability …

On prosody modeling for ASR+ TTS based voice conversion

WC Huang, T Hayashi, X Li… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In voice conversion (VC), an approach showing promising results in the latest voice
conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) …

GC-TTS: Few-shot speaker adaptation with geometric constraints

JH Kim, SH Lee, JH Lee, HG Jung… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to
reproduce a novel speaker's voice with a few training data. While numerous attempts have …

Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

B Lőrincz, A Stan, M Giurgiu - 2021 29th European Signal …, 2021 - ieeexplore.ieee.org
Building multispeaker neural network-based text-to-speech synthesis systems commonly
relies on the availability of large amounts of high quality recordings from each speaker and …