- Academic Search

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Speichern Zitieren Zitiert von: 467 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arxiv preprint arxiv …, 2019 - arxiv.org

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

Speichern Zitieren Zitiert von: 1052 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fastpitch: Parallel text-to-speech with pitch prediction

A Łańcucki - ICASSP 2021-2021 IEEE International Conference …, 2021 - ieeexplore.ieee.org

We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech,
conditioned on fundamental frequency contours. The model predicts pitch contours during …

Speichern Zitieren Zitiert von: 404 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning

Y Zhang, RJ Weiss, H Zen, Y Wu, Z Chen… - arxiv preprint arxiv …, 2019 - arxiv.org

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on
Tacotron that is able to produce high quality speech in multiple languages. Moreover, the …

Speichern Zitieren Zitiert von: 201 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Location-relative attention mechanisms for robust long-form speech synthesis

E Battenberg, RJ Skerry-Ryan… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Despite the ability to produce human-level speech for in-domain text, attention-based end-to-
end text-to-speech (TTS) systems suffer from text alignment failures that increase in …

Speichern Zitieren Zitiert von: 135 Ähnliche Artikel Alle 6 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

PnG BERT: Augmented BERT on phonemes and graphemes for neural TTS

Y Jia, H Zen, J Shen, Y Zhang, Y Wu - arxiv preprint arxiv:2103.15060, 2021 - arxiv.org

This paper introduces PnG BERT, a new encoder model for neural TTS. This model is
augmented from the original BERT model, by taking both phoneme and grapheme …

Speichern Zitieren Zitiert von: 87 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mixed-phoneme bert: Improving bert with mixed phoneme and sup-phoneme representations for text to speech

G Zhang, K Song, X Tan, D Tan, Y Yan, Y Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech
(TTS) has drawn increasing attention. However, the works apply pre-training with character …

Speichern Zitieren Zitiert von: 25 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cotatron: Transcription-guided speech encoder for any-to-many voice conversion without parallel data

S Park, D Kim, M Joe - arxiv preprint arxiv:2005.03295, 2020 - arxiv.org

We propose Cotatron, a transcription-guided speech encoder for speaker-independent
linguistic representation. Cotatron is based on the multispeaker TTS architecture and can be …

Speichern Zitieren Zitiert von: 49 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deep Griffin–Lim iteration: Trainable iterative phase reconstruction using neural network

Y Masuyama, K Yatabe, Y Koizumi… - IEEE Journal of …, 2020 - ieeexplore.ieee.org

In this paper, we propose a phase reconstruction framework, named Deep Griffin-Lim
Iteration (DeGLI). Phase reconstruction is a fundamental technique for improving the quality …

Speichern Zitieren Zitiert von: 35 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

SoundChoice: Grapheme-to-phoneme models with semantic disambiguation

A Ploujnikov, M Ravanelli - arxiv preprint arxiv:2207.13703, 2022 - arxiv.org

End-to-end speech synthesis models directly convert the input characters into an audio
representation (eg, spectrograms). Despite their impressive performance, such models have …

Speichern Zitieren Zitiert von: 20 Ähnliche Artikel Alle 5 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Representation mixing for tts synthesis

A survey on neural speech synthesis

Libritts: A corpus derived from librispeech for text-to-speech

Fastpitch: Parallel text-to-speech with pitch prediction

Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning

Location-relative attention mechanisms for robust long-form speech synthesis

PnG BERT: Augmented BERT on phonemes and graphemes for neural TTS

Mixed-phoneme bert: Improving bert with mixed phoneme and sup-phoneme representations for text to speech

Cotatron: Transcription-guided speech encoder for any-to-many voice conversion without parallel data

Deep Griffin–Lim iteration: Trainable iterative phase reconstruction using neural network

SoundChoice: Grapheme-to-phoneme models with semantic disambiguation