[BOOK][B] Text-to-speech synthesis using found data for low-resource languages

E Cooper - 2019 - search.proquest.com
Text-to-speech synthesis is a key component of interactive, speech-based systems.
Typically, building a high-quality voice requires collecting dozens of hours of speech from a …

[PDF][PDF] Subjective and Objective Evaluation of Speech Intelligibility Enhancement Under Constant Energy and Duration Constraints.

Y Tang, M Cooke - Interspeech, 2011 - researchgate.net
Speakers appear to adopt strategies to improve speech intelligibility for interlocutors in
adverse acoustic conditions. Generated speech, whether synthetic, recorded or live, may …

Utterance selection for optimizing intelligibility of tts voices trained on asr data

E Cooper, X Wang - Interspeech 2017, 2017 - par.nsf.gov
This paper describes experiments in training HMM-based text-to-speech (TTS) voices on
data collected for Automatic Speech Recognition (ASR) training. We compare a number of …

Can objective measures predict the intelligibility of modified HMM-based synthetic speech in noise?

C Valentini-Botinhao, J Yamagishi… - Interspeech 2011: 12th …, 2011 - research.ed.ac.uk
Synthetic speech can be modified to improve intelligibility in noise. In order to perform
modifications automatically, it would be useful to have an objective measure that could …

Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion

C Valentini-Botinhao, J Yamagishi, S King… - Computer Speech & …, 2014 - Elsevier
This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM)
generated synthetic speech in noise. We present a method for modifying the Mel cepstral …

Multimodal physiological quality-of-experience assessment of text-to-speech systems

R Gupta, HJ Banville, TH Falk - IEEE Journal of Selected Topics …, 2016 - ieeexplore.ieee.org
With the growing complexity of various text-to-speech systems, it is becoming more
important to understand the underlying perceptual and judgement processes that drive user …

Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise

C Valentini-Botinhao, R Maia… - … , Speech and Signal …, 2012 - ieeexplore.ieee.org
In this paper we introduce a new cepstral coefficient extraction method based on an
intelligibility measure for speech in noise, the Glimpse Proportion measure. This new …

The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

S Arif, T Arif, MS Haroon, AJ Khan, AA Raza… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces the concept of an education tool that utilizes Generative Artificial
Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven …

Emilia: a speech corpus for Argentine Spanish text to speech synthesis

HM Torres, JA Gurlekian, DA Evin… - Language Resources …, 2019 - Springer
This paper introduces Emilia, a speech corpus created to build a female voice in Spanish
spoken in Buenos Aires for the Aromo text-to-speech system. Aromo is a unit selection text …

Fusion of magnitude and phase-based features for objective evaluation of TTS voice

HB Sailor, HA Patil - The 9th International Symposium on …, 2014 - ieeexplore.ieee.org
This paper analyzes the distance-based objective measures for evaluation of Text-to-
Speech (TTS) systems (which is generally used objective measures). In this paper, we …