A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Nautilus: a versatile voice cloning system
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …
with a target voice either from a text input or a reference utterance of an arbitrary source …
The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Conventional text-to-speech (TTS) research has predominantly focused on enhancing the
quality of synthesized speech for speakers in the training dataset. The challenge of …
quality of synthesized speech for speakers in the training dataset. The challenge of …
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS
quality with lower supervised data requirements via Vector-Quantized Self-Supervised …
quality with lower supervised data requirements via Vector-Quantized Self-Supervised …
[PDF][PDF] Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis.
We present a cross-lingual speaker adaptation method based on domain adaptation and a
speaker consistency loss for text-tospeech (TTS) synthesis. Existing monolingual speaker …
speaker consistency loss for text-tospeech (TTS) synthesis. Existing monolingual speaker …
Advancing Accessibility: Voice Cloning and Speech Synthesis for Individuals with Speech Disorders
LD Anand, DJ Reji - arxiv preprint arxiv:2401.11771, 2024 - arxiv.org
Neural Text-to-speech (TTS) synthesis is a powerful technology that can generate speech
using neural networks. One of the most remarkable features of TTS synthesis is its capability …
using neural networks. One of the most remarkable features of TTS synthesis is its capability …
On prosody modeling for ASR+ TTS based voice conversion
In voice conversion (VC), an approach showing promising results in the latest voice
conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) …
conversion challenge (VCC) 2020 is to first use an automatic speech recognition (ASR) …
GC-TTS: Few-shot speaker adaptation with geometric constraints
Few-shot speaker adaptation is a specific Text-to-Speech (TTS) system that aims to
reproduce a novel speaker's voice with a few training data. While numerous attempts have …
reproduce a novel speaker's voice with a few training data. While numerous attempts have …
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Building multispeaker neural network-based text-to-speech synthesis systems commonly
relies on the availability of large amounts of high quality recordings from each speaker and …
relies on the availability of large amounts of high quality recordings from each speaker and …