Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature
The mainstream neural text-to-speech (TTS) pipeline is a cascade system, including an
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …
Controllable accented text-to-speech synthesis with fine and coarse-grained intensity rendering
Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …
variant of the standard version (L1), which is challenging as L2 is different from L1 in terms …
Emodiff: Intensity controllable emotional text-to-speech with soft-label guidance
Although current neural text-to-speech (TTS) models are able to generate high-quality
speech, intensity controllable emotional TTS is still a challenging task. Most existing …
speech, intensity controllable emotional TTS is still a challenging task. Most existing …
Autoregressive diffusion transformer for text-to-speech synthesis
Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
Prompttts++: Controlling speaker identity in prompt-based text-to-speech using natural language descriptions
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that
allows control over speaker identity using natural language descriptions. To control speaker …
allows control over speaker identity using natural language descriptions. To control speaker …
Diffprosody: Diffusion-based latent prosody generation for expressive speech synthesis with prosody conditional adversarial training
Expressive text-to-speech systems have undergone significant advancements owing to
prosody modeling, but conventional methods can still be improved. Traditional approaches …
prosody modeling, but conventional methods can still be improved. Traditional approaches …
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully
cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style …
cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style …
Speaker adaptive text-to-speech with timbre-normalized vector-quantized feature
Achieving high fidelity and speaker similarity in text-to-speech speaker adaptation with
limited amount of data is a challenging task. Most existing methods only consider adapting …
limited amount of data is a challenging task. Most existing methods only consider adapting …
Acoustic modeling for end-to-end empathetic dialogue speech synthesis using linguistic and prosodic contexts of dialogue history
We propose an end-to-end empathetic dialogue speech synthesis (DSS) model that
considers both the linguistic and prosodic contexts of dialogue history. Empathy is the active …
considers both the linguistic and prosodic contexts of dialogue history. Empathy is the active …