Towards end-to-end prosody transfer for expressive speech synthesis with tacotron

RJ Skerry-Ryan, E Battenberg, Y **ao… - international …, 2018 - proceedings.mlr.press
We present an extension to the Tacotron speech synthesis architecture that learns a latent
embedding space of prosody, derived from a reference acoustic representation containing …

Towards long-term social child-robot interaction: using multi-activity switching to engage young users

A Coninx, P Baxter, E Oleari, S Bellini… - Journal of Human …, 2016 - uhra.herts.ac.uk
Social robots have the potential to provide support in a number of practical domains, such as
learning and behaviour change. This potential is particularly relevant for children, who have …

Improving prosody modelling with cross-utterance bert embeddings for end-to-end speech synthesis

G Xu, W Song, Z Zhang, C Zhang… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Although speech prosody is related to the linguistic information up to the discourse structure,
most text-to-speech (TTS) systems only take into account the information within each …

Different parts of the same elephant: A roadmap to disentangle and connect different perspectives on prosodic prominence

P Wagner, A Origlia, C Avezani… - … Congress of Phonetic …, 2015 - shs.hal.science
Prosodic prominence is an umbrella term encompassing various related but conceptually
and functionally different phenomena such as phonological stress, paralinguistic emphasis …

Improving unsupervised style transfer in end-to-end speech synthesis with end-to-end speech recognition

DR Liu, CY Yang, SL Wu, HY Lee - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
End-to-end TTS model can directly take an utterance as reference, and generate speech
from the text with prosody and speaker characteristics similar to the reference utterance …

[PDF][PDF] Controlling Prominence Realisation in Parametric DNN-Based Speech Synthesis.

Z Malisz, H Berthelsen, J Beskow, J Gustafson - Interspeech, 2017 - speech.kth.se
This work aims to improve text-to-speech synthesis for Wikipedia by advancing and
implementing models of prosodic prominence. We propose a new system architecture with …

[BUKU][B] Towards Expressive Perception and Generation in Human-Computer Conversational Interaction

Y Bu, R Li, Z You - 2024 - books.google.com
The construction of a natural interactive human-computer interaction has become an integral
component of intelligent system development, constituting a core subject within the field of …

[BUKU][B] Speech and Automata in Health Care

A Neustein - 2014 - degruyter.com
It is often so hard to pinpoint the genesis of a book. Ideas and concepts are naturally fluent;
they float around in our mind until one day these many feathers of thoughts settle and come …

[PDF][PDF] Suprasegmental representations for the modeling of fundamental frequency in statistical parametric speech synthesis

MFSB Ribeiro - 2018 - core.ac.uk
Statistical parametric speech synthesis (SPSS) has seen improvements over recent years,
especially in terms of intelligibility. Synthetic speech is often clear and understandable, but it …

[PDF][PDF] ASR and TTS for Voice Controlled Child-Robot Interactions for Treating Children with Metabolic Disorders

G Sommavilla, F Tesser, G Paci, P Cosi - pd.istc.cnr.it
Artificial companion agents are becoming increasingly important in the field of healthcare,
particularly when children are involved, with the aim of providing novel educational tools …