An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

An overview of noise-robust automatic speech recognition

J Li, L Deng, Y Gong… - IEEE/ACM Transactions …, 2014 - ieeexplore.ieee.org
New waves of consumer-centric applications, such as voice search and voice interaction
with mobile devices and home entertainment systems, increasingly require automatic …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arxiv preprint arxiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

[PDF][PDF] Investigating RNN-based speech enhancement methods for noise-robust Text-to-Speech.

C Valentini-Botinhao, X Wang, S Takaki, J Yamagishi - SSW, 2016 - isca-archive.org
The quality of text-to-speech (TTS) voices built from noisy speech is compromised.
Enhancing the speech data before training has been shown to improve quality but voices …

Statistical parametric speech synthesis using deep neural networks

H Zen, A Senior, M Schuster - 2013 ieee international …, 2013 - ieeexplore.ieee.org
Conventional approaches to statistical parametric speech synthesis typically use decision
tree-clustered context-dependent hidden Markov models (HMMs) to represent probability …

[PDF][PDF] TTS synthesis with bidirectional LSTM based recurrent neural networks

Y Fan, Y Qian, FL **e, FK Soong - Fifteenth annual conference of …, 2014 - isca-archive.org
Feed-forward, Deep neural networks (DNN)-based text-tospeech (TTS) systems have been
recently shown to outperform decision-tree clustered context-dependent HMM TTS systems …

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …