A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Diffsinger: Singing voice synthesis via shallow diffusion mechanism

J Liu, C Li, Y Ren, F Chen, Z Zhao - … of the AAAI conference on artificial …, 2022 - ojs.aaai.org
Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive
singing voice, in which the acoustic model generates the acoustic features (eg, mel …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

M4singer: A multi-style, multi-singer and musical score provided mandarin singing corpus

L Zhang, R Li, S Wang, L Deng, J Liu… - Advances in …, 2022 - proceedings.neurips.cc
The lack of publicly available high-quality and accurately labeled datasets has long been a
major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present …

A review of differentiable digital signal processing for music and speech synthesis

B Hayes, J Shier, G Fazekas, A McPherson… - Frontiers in Signal …, 2024 - frontiersin.org
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …

Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias

Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …

The singing voice conversion challenge 2023

WC Huang, LP Violeta, S Liu, J Shi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …

Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus

R Huang, F Chen, Y Ren, J Liu, C Cui… - Proceedings of the 29th …, 2021 - dl.acm.org
High-fidelity multi-singer singing voice synthesis is challenging for neural vocoder due to the
singing voice data shortage, limited singer generalization, and large computational cost …

Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis

Y Wang, X Wang, P Zhu, J Wu, H Li, H Xue… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …

Visinger: Variational inference with adversarial learning for end-to-end singing voice synthesis

Y Zhang, J Cong, H Xue, L **e… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In this paper, we propose VISinger, a complete end-to-end high-quality singing voice
synthesis (SVS) system that directly generates singing audio from lyrics and musical score …