A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Conventional and contemporary approaches used in text to speech synthesis: A review
N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …
like natural sounding voice from the written text, is gaining popularity in the field of speech …
Neural codec language models are zero-shot text to speech synthesizers
We introduce a language modeling approach for text to speech synthesis (TTS). Specifically,
we train a neural codec language model (called Vall-E) using discrete codes derived from …
we train a neural codec language model (called Vall-E) using discrete codes derived from …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Libritts: A corpus derived from librispeech for text-to-speech
This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …
An unsupervised autoregressive model for speech representation learning
This paper proposes a novel unsupervised autoregressive neural model for learning generic
speech representations. In contrast to other speech representation learning methods that …
speech representations. In contrast to other speech representation learning methods that …
Generative pre-training for speech with autoregressive predictive coding
Learning meaningful and general representations from unannotated speech that are
applicable to a wide range of tasks remains challenging. In this paper we propose to use …
applicable to a wide range of tasks remains challenging. In this paper we propose to use …
A review of deep learning based speech synthesis
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
This paper proposes a hierarchical, fine-grained and interpretable latent variable model for
prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution …
prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution …
Lrspeech: Extremely low-resource speech synthesis and recognition
Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR)
are important speech tasks, and require a large amount of text and speech pairs for model …
are important speech tasks, and require a large amount of text and speech pairs for model …