Hierarchical emotion prediction and control in text-to-speech synthesis

S Inoue, K Zhou, S Wang, H Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS)
synthesis. Prior studies have primarily focused on learning a global prosodic representation …

Emotional dimension control in language model-based text-to-speech: Spanning a broad spectrum of human emotions

K Zhou, Y Zhang, S Zhao, H Wang, Z Pan, D Ng… - arxiv preprint arxiv …, 2024 - arxiv.org
Current emotional text-to-speech (TTS) systems face challenges in mimicking a broad
spectrum of human emotions due to the inherent complexity of emotions and limitations in …

Fine-grained quantitative emotion editing for speech generation

S Inoue, K Zhou, S Wang, H Li - 2024 Asia Pacific Signal and …, 2024 - ieeexplore.ieee.org
It remains a significant challenge how to quantitatively control the expressiveness of speech
emotion in speech generation. In this work, we propose an approach for quantitative …

EMOCONV-Diff: Diffusion-based speech emotion conversion for non-parallel and in-the-wild data

NR Prabhu, B Lay, S Welker… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Speech emotion conversion is the task of converting the expressed emotion of a spoken
utterance to a target emotion while preserving the lexical content and speaker identity. While …

Hierarchical Control of Emotion Rendering in Speech Synthesis

S Inoue, K Zhou, S Wang, H Li - arxiv preprint arxiv:2412.12498, 2024 - arxiv.org
Emotional text-to-speech synthesis (TTS) aims to generate realistic emotional speech from
input text. However, quantitatively controlling multi-level emotion rendering remains …

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion

K Zhou, B Sisman, C Busso, B Ma, H Li - arxiv preprint arxiv:2210.13756, 2022 - arxiv.org
Emotional voice conversion (EVC) traditionally targets the transformation of spoken
utterances from one emotional state to another, with previous research mainly focusing on …