Google znalac

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Spremi Citiraj Spominje se 469 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end adversarial text-to-speech

J Donahue, S Dieleman, M Bińkowski, E Elsen… - arxiv preprint arxiv …, 2020 - arxiv.org

Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each
of which is designed or learnt independently from the rest. In this work, we take on the …

Spremi Citiraj Spominje se 228 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Parallel tacotron: Non-autoregressive and controllable tts

I Elias, H Zen, J Shen, Y Zhang, Y Jia… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

Although neural end-to-end text-to-speech models can synthesize highly natural speech,
there is still room for improvements to its efficiency and naturalness. This paper proposes a …

Spremi Citiraj Spominje se 133 puta Srodni članci Svih 5 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rall-e: Robust codec language modeling with chain-of-thought prompting for text-to-speech synthesis

D **n, X Tan, K Shen, Z Ju, D Yang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis.
While previous work based on large language models (LLMs) shows impressive …

Spremi Citiraj Spominje se 29 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

LW Chen, S Watanabe, A Rudnicky - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Abstract Recent Text-to-Speech (TTS) systems trained on reading or acted corpora have
achieved near human-level naturalness. The diversity of human speech, however, often …

Spremi Citiraj Spominje se 41 puta Srodni članci Svih 6 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multispeech: Multi-speaker text to speech with transformer

M Chen, X Tan, Y Ren, J Xu, H Sun, S Zhao… - arxiv preprint arxiv …, 2020 - arxiv.org

Transformer-based text to speech (TTS) model (eg, Transformer TTS~\cite {li2019neural},
FastSpeech~\cite {ren2019fastspeech}) has shown the advantages of training and inference …

Spremi Citiraj Spominje se 125 puta Srodni članci Svih 6 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Non-attentive tacotron: Robust and controllable neural tts synthesis including unsupervised duration modeling

J Shen, Y Jia, M Chrzanowski, Y Zhang, I Elias… - arxiv preprint arxiv …, 2020 - arxiv.org

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model,
replacing the attention mechanism with an explicit duration predictor. This improves …

Spremi Citiraj Spominje se 104 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VALL-E R: Robust and efficient zero-shot text-to-speech synthesis via monotonic alignment

B Han, L Zhou, S Liu, S Chen, L Meng, Y Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

With the help of discrete neural audio codecs, large language models (LLM) have
increasingly been recognized as a promising methodology for zero-shot Text-to-Speech …

Spremi Citiraj Spominje se 17 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Location-relative attention mechanisms for robust long-form speech synthesis

E Battenberg, RJ Skerry-Ryan… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Despite the ability to produce human-level speech for in-domain text, attention-based end-to-
end text-to-speech (TTS) systems suffer from text alignment failures that increase in …

Spremi Citiraj Spominje se 133 puta Srodni članci Svih 5 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Parallel Tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling

I Elias, H Zen, J Shen, Y Zhang, Y Jia… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech
model with a fully differentiable duration model which does not require supervised duration …

Spremi Citiraj Spominje se 77 puta Srodni članci Svih 4 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Robust sequence-to-sequence acoustic modeling with stepwise monotonic attention for neural TTS

A survey on neural speech synthesis

End-to-end adversarial text-to-speech

Parallel tacotron: Non-autoregressive and controllable tts

Rall-e: Robust codec language modeling with chain-of-thought prompting for text-to-speech synthesis

A vector quantized approach for text to speech synthesis on real-world spontaneous speech

Multispeech: Multi-speaker text to speech with transformer

Non-attentive tacotron: Robust and controllable neural tts synthesis including unsupervised duration modeling

VALL-E R: Robust and efficient zero-shot text-to-speech synthesis via monotonic alignment

Location-relative attention mechanisms for robust long-form speech synthesis

Parallel Tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling