Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
End-to-end adversarial text-to-speech
Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each
of which is designed or learnt independently from the rest. In this work, we take on the …
of which is designed or learnt independently from the rest. In this work, we take on the …
Parallel tacotron: Non-autoregressive and controllable tts
Although neural end-to-end text-to-speech models can synthesize highly natural speech,
there is still room for improvements to its efficiency and naturalness. This paper proposes a …
there is still room for improvements to its efficiency and naturalness. This paper proposes a …
Rall-e: Robust codec language modeling with chain-of-thought prompting for text-to-speech synthesis
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis.
While previous work based on large language models (LLMs) shows impressive …
While previous work based on large language models (LLMs) shows impressive …
A vector quantized approach for text to speech synthesis on real-world spontaneous speech
Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …
achieved near human-level naturalness. The diversity of human speech, however, often …
Multispeech: Multi-speaker text to speech with transformer
Transformer-based text to speech (TTS) model (eg, Transformer TTS~\cite {li2019neural},
FastSpeech~\cite {ren2019fastspeech}) has shown the advantages of training and inference …
FastSpeech~\cite {ren2019fastspeech}) has shown the advantages of training and inference …
Non-attentive tacotron: Robust and controllable neural tts synthesis including unsupervised duration modeling
This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model,
replacing the attention mechanism with an explicit duration predictor. This improves …
replacing the attention mechanism with an explicit duration predictor. This improves …
VALL-E R: Robust and efficient zero-shot text-to-speech synthesis via monotonic alignment
With the help of discrete neural audio codecs, large language models (LLM) have
increasingly been recognized as a promising methodology for zero-shot Text-to-Speech …
increasingly been recognized as a promising methodology for zero-shot Text-to-Speech …
Location-relative attention mechanisms for robust long-form speech synthesis
Despite the ability to produce human-level speech for in-domain text, attention-based end-to-
end text-to-speech (TTS) systems suffer from text alignment failures that increase in …
end text-to-speech (TTS) systems suffer from text alignment failures that increase in …
Parallel Tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech
model with a fully differentiable duration model which does not require supervised duration …
model with a fully differentiable duration model which does not require supervised duration …