- Academic Search

Speichern Zitieren Zitiert von: 70 Ähnliche Artikel Alle 2 Versionen HTML-Version

Deep encoder-decoder models for unsupervised learning of controllable speech synthesis

GE Henter, J Lorenzo-Trueba, X Wang… - arxiv preprint arxiv …, 2018 - arxiv.org

Generating versatile and appropriate synthetic speech requires control over the output
expression separate from the spoken text. Important non-textual speech variation is seldom …

Speichern Zitieren Zitiert von: 93 Ähnliche Artikel Alle 9 Versionen

From HMMs to DNNs: where do the improvements come from?

O Watts, GE Henter, T Merritt, Z Wu… - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

Deep neural networks (DNNs) have recently been the focus of much text-to-speech research
as a replacement for decision trees and hidden Markov models (HMMs) in statistical …

Speichern Zitieren Zitiert von: 24 Ähnliche Artikel Alle 7 Versionen

Neural HMMs are all you need (for high-quality attention-free TTS)

S Mehta, É Székely, J Beskow… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Neural sequence-to-sequence TTS has achieved significantly better output quality than
statistical speech synthesis using HMMs. However, neural TTS is generally not probabilistic …

Speichern Zitieren Zitiert von: 15 Ähnliche Artikel Alle 10 Versionen HTML-Version

OverFlow: Putting flows on top of neural transducers for better TTS

S Mehta, A Kirkland, H Lameris, J Beskow… - arxiv preprint arxiv …, 2022 - arxiv.org

Neural HMMs are a type of neural transducer recently proposed for sequence-to-sequence
modelling in text-to-speech. They combine the best features of classic statistical speech …

Speichern Zitieren Zitiert von: 76 Ähnliche Artikel Alle 6 Versionen

Deep neural network-guided unit selection synthesis

T Merritt, RAJ Clark, Z Wu… - … on Acoustics, Speech …, 2016 - ieeexplore.ieee.org

Vocoding of speech is a standard part of statistical parametric speech synthesis systems. It
imposes an upper bound of the naturalness that can possibly be achieved. Hybrid systems …

Speichern Zitieren Zitiert von: 23 Ähnliche Artikel Alle 8 Versionen HTML-Version

Ctrl-P: Temporal control of prosodic variation for speech synthesis

DSR Mohan, V Hu, TH Teh, A Torresquintero… - arxiv preprint arxiv …, 2021 - arxiv.org

Text does not fully specify the spoken form, so text-to-speech models must be able to learn
from speech data that vary in ways not explained by the corresponding text. One way to …

Speichern Zitieren Zitiert von: 57 Ähnliche Artikel Alle 5 Versionen

An autoregressive recurrent mixture density network for parametric speech synthesis

X Wang, S Takaki, J Yamagishi - 2017 IEEE international …, 2017 - ieeexplore.ieee.org

Neural-network-based generative models, such as mixture density networks, are potential
solutions for speech synthesis. In this paper we follow this path and propose a recurrent …

Speichern Zitieren Zitiert von: 49 Ähnliche Artikel Alle 4 Versionen HTML-Version

Principles for learning controllable TTS from annotated and latent variation

G Henter, J Lorenzo-Trueba, X Wang… - Interspeech …, 2017 - research.ed.ac.uk

For building flexible and appealing high-quality speech synthesisers, it is desirable to be
able to accommodate and reproduce fine variations in vocal expression present in natural …