A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Adaspeech: Adaptive text to speech for custom voice

M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao… - arxiv preprint arxiv …, 2021 - arxiv.org
Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims
to adapt a source TTS model to synthesize personal voice for a target speaker using few …

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arxiv preprint arxiv:2104.09995, 2021 - arxiv.org
As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

Mega-tts 2: Boosting prompting mechanisms for zero-shot speech synthesis

Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Zero-shot text-to-speech (TTS) aims to synthesize voices with unseen speech prompts,
which significantly reduces the data and computation requirements for voice cloning by …

VALL-E R: Robust and efficient zero-shot text-to-speech synthesis via monotonic alignment

B Han, L Zhou, S Liu, S Chen, L Meng, Y Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
With the help of discrete neural audio codecs, large language models (LLM) have
increasingly been recognized as a promising methodology for zero-shot Text-to-Speech …

Usat: A universal speaker-adaptive text-to-speech approach

W Wang, Y Song, S Jha - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Conventional text-to-speech (TTS) research has predominantly focused on enhancing the
quality of synthesized speech for speakers in the training dataset. The challenge of …

Adaspeech 2: Adaptive text to speech with untranscribed data

Y Yan, X Tan, B Li, T Qin, S Zhao… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Text to speech (TTS) is widely used to synthesize personal voice for a target speaker, where
a well-trained source TTS model is fine-tuned with few paired adaptation data (speech and …

GANSpeech: Adversarial training for high-fidelity multi-speaker speech synthesis

J Yang, JS Bae, T Bak, Y Kim, HY Cho - arxiv preprint arxiv:2106.15153, 2021 - arxiv.org
Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the
generation of reasonably good speech quality with a single model and made it possible to …

Speaker generation

D Stanton, M Shannon, S Mariooryad… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This work explores the task of synthesizing speech in non-existent human-sounding voices.
We call this task" speaker generation", and present TacoSpawn, a system that performs …

An Overview of Deep Neural Networks for Few-Shot Learning

J Zhao, L Kong, J Lv - Big Data Mining and Analytics, 2024 - ieeexplore.ieee.org
Recent advancements in deep learning have led to significant breakthroughs across various
fields. However, these methods often require extensive labeled data for optimal …