A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Y Kumar, A Koul, C Singh - Multimedia Tools and Applications, 2023 - Springer
Text-to-speech systems (TTS) have come a long way in the last decade and are now a
popular research topic for creating various human-computer interaction systems. Although, a …

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Glow-tts: A generative flow for text-to-speech via monotonic alignment search

J Kim, S Kim, J Kong, S Yoon - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Recently, text-to-speech (TTS) models such as FastSpeech and ParaNet have been
proposed to generate mel-spectrograms from text in parallel. Despite the advantage, the …

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

Towards end-to-end synthetic speech detection

G Hua, ABJ Teoh, H Zhang - IEEE Signal Processing Letters, 2021 - ieeexplore.ieee.org
The constant Q transform (CQT) has been shown to be one of the most effective speech
signal pre-transforms to facilitate synthetic speech detection, followed by either hand-crafted …

SampleRNN: An unconditional end-to-end neural audio generation model

S Mehri, K Kumar, I Gulrajani, R Kumar, S Jain… - arxiv preprint arxiv …, 2016 - arxiv.org
In this paper we propose a novel model for unconditional audio generation based on
generating one audio sample at a time. We show that our model, which profits from …

Char2wav: End-to-end speech synthesis

J Sotelo, S Mehri, K Kumar, JF Santos, K Kastner… - 2017 - openreview.net
We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two
components: a reader and a neural vocoder. The reader is an encoder-decoder model with …