A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Diffsinger: Singing voice synthesis via shallow diffusion mechanism
Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive
singing voice, in which the acoustic model generates the acoustic features (eg, mel …
singing voice, in which the acoustic model generates the acoustic features (eg, mel …
Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
M4singer: A multi-style, multi-singer and musical score provided mandarin singing corpus
The lack of publicly available high-quality and accurately labeled datasets has long been a
major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present …
major bottleneck for singing voice synthesis (SVS). To tackle this problem, we present …
A review of differentiable digital signal processing for music and speech synthesis
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …
loss function gradients are backpropagated through digital signal processors, facilitating …
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …
The singing voice conversion challenge 2023
We present the latest iteration of the voice conversion challenge (VCC) series, a bi-annual
scientific event aiming to compare and understand different voice conversion (VC) systems …
scientific event aiming to compare and understand different voice conversion (VC) systems …
Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus
High-fidelity multi-singer singing voice synthesis is challenging for neural vocoder due to the
singing voice data shortage, limited singer generalization, and large computational cost …
singing voice data shortage, limited singer generalization, and large computational cost …
Opencpop: A high-quality open source chinese popular song corpus for singing voice synthesis
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin …
Visinger: Variational inference with adversarial learning for end-to-end singing voice synthesis
In this paper, we propose VISinger, a complete end-to-end high-quality singing voice
synthesis (SVS) system that directly generates singing audio from lyrics and musical score …
synthesis (SVS) system that directly generates singing audio from lyrics and musical score …