- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Enregistrer Citer Cité 610 fois Autres articles Les 2 versions Free GPT-4

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer

Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

Enregistrer Citer Cité 60 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Neural codec language models are zero-shot text to speech synthesizers

C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …

Enregistrer Citer Cité 624 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Enregistrer Citer Cité 465 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arxiv preprint arxiv …, 2019 - arxiv.org

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

Enregistrer Citer Cité 1043 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

An unsupervised autoregressive model for speech representation learning

YA Chung, WN Hsu, H Tang, J Glass - arxiv preprint arxiv:1904.03240, 2019 - arxiv.org

This paper proposes a novel unsupervised autoregressive neural model for learning generic
speech representations. In contrast to other speech representation learning methods that …

Enregistrer Citer Cité 469 fois Autres articles Les 13 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Generative pre-training for speech with autoregressive predictive coding

YA Chung, J Glass - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org

Learning meaningful and general representations from unannotated speech that are
applicable to a wide range of tasks remains challenging. In this paper we propose to use …

Enregistrer Citer Cité 227 fois Autres articles Les 10 versions Free GPT-4

[Free GPT-4]

[PDF] mdpi.com

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C **ng, LJ Zhang - Applied Sciences, 2019 - mdpi.com

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

Enregistrer Citer Cité 229 fois Autres articles Les 6 versions Free GPT-4 En cache

[Free GPT-4]

[PDF] arxiv.org

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

G Sun, Y Zhang, RJ Weiss, Y Cao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper proposes a hierarchical, fine-grained and interpretable latent variable model for
prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution …

Enregistrer Citer Cité 156 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Lrspeech: Extremely low-resource speech synthesis and recognition

J Xu, X Tan, Y Ren, T Qin, J Li, S Zhao… - Proceedings of the 26th …, 2020 - dl.acm.org

Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR)
are important speech tasks, and require a large amount of text and speech pairs for model …

Enregistrer Citer Cité 102 fois Autres articles Les 4 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Semi-supervised training for improving data efficiency in end-to-end speech synthesis

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

Conventional and contemporary approaches used in text to speech synthesis: A review

Neural codec language models are zero-shot text to speech synthesizers

A survey on neural speech synthesis

Libritts: A corpus derived from librispeech for text-to-speech

An unsupervised autoregressive model for speech representation learning

Generative pre-training for speech with autoregressive predictive coding

A review of deep learning based speech synthesis

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

Lrspeech: Extremely low-resource speech synthesis and recognition