Recent advances in convolutional neural networks

J Gu, Z Wang, J Kuen, L Ma, A Shahroudy, B Shuai… - Pattern recognition, 2018 - Elsevier
In the last few years, deep learning has led to very good performance on a variety of
problems, such as visual recognition, speech recognition and natural language processing …

Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

Generative adversarial network-based postfilter for statistical parametric speech synthesis

T Kaneko, H Kameoka, N Hojo, Y Ijima… - … on acoustics, speech …, 2017 - ieeexplore.ieee.org
We propose a postfilter based on a generative adversarial network (GAN) to compensate for
the differences between natural speech and speech synthesized by statistical parametric …

Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks

CV Botinhao, X Wang, S Takaki, J Yamagishi - Interspeech 2016, 2016 - research.ed.ac.uk
Quality of text-to-speech voices built from noisy recordings is diminished. In order to improve
it we propose the use of a recurrent neural network to enhance acoustic parameters prior to …

Review of end-to-end speech synthesis technology based on deep learning

Z Mu, X Yang, Y Dong - arxiv preprint arxiv:2104.09995, 2021 - arxiv.org
As an indispensable part of modern human-computer interaction system, speech synthesis
technology helps users get the output of intelligent machine more easily and intuitively, thus …

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Y Yasuda, X Wang, J Yamagishi - Computer Speech & Language, 2021 - Elsevier
Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality
speech directly from text or simple linguistic features such as phonemes. Unlike traditional …

Improving trajectory modelling for DNN-based speech synthesis by using stacked bottleneck features and minimum generation error training

Z Wu, S King - IEEE/ACM Transactions on Audio, Speech, and …, 2016 - ieeexplore.ieee.org
We propose two novel techniques-stacking bottleneck features and minimum generation
error (MGE) training criterion-to improve the performance of deep neural network (DNN) …

HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection

X Wang, S Yang, M Tang, H Yin, H Huang… - International Journal of …, 2019 - Elsevier
Background Cleft palate patients have inability to produce adequate velopharyngeal
closure, which results in hypernasal speech. In clinic, hypernasal speech is assessed …

Deep Elman recurrent neural networks for statistical parametric speech synthesis

S Achanta, SV Gangashetty - Speech Communication, 2017 - Elsevier
Owing to the success of deep learning techniques in automatic speech recognition, deep
neural networks (DNNs) have been used as acoustic models for statistical parametric …

[HTML][HTML] Deepconversion: Voice conversion with limited parallel training data

M Zhang, B Sisman, L Zhao, H Li - Speech Communication, 2020 - Elsevier
A deep neural network approach to voice conversion usually depends on a large amount of
parallel training data from source and target speakers. In this paper, we propose a novel …