An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023‏ - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022‏ - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Emotion rendering for conversational speech synthesis with heterogeneous graph-based context modeling

R Liu, Y Hu, Y Ren, X Yin, H Li - … of the AAAI Conference on Artificial …, 2024‏ - ojs.aaai.org
Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the
appropriate prosody and emotional inflection within a conversational setting. While …

Emodiff: Intensity controllable emotional text-to-speech with soft-label guidance

Y Guo, C Du, X Chen, K Yu - ICASSP 2023-2023 IEEE …, 2023‏ - ieeexplore.ieee.org
Although current neural text-to-speech (TTS) models are able to generate high-quality
speech, intensity controllable emotional TTS is still a challenging task. Most existing …

An overview & analysis of sequence-to-sequence emotional voice conversion

Z Yang, X **g, A Triantafyllopoulos, M Song… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source
to a target emotion; it can thus be a key enabling technology for human-computer interaction …

Emomix: Emotion mixing via diffusion models for emotional speech synthesis

H Tang, X Zhang, J Wang, N Cheng, J **ao - arxiv preprint arxiv …, 2023‏ - arxiv.org
There has been significant progress in emotional Text-To-Speech (TTS) synthesis
technology in recent years. However, existing methods primarily focus on the synthesis of a …

Speech based suicide risk recognition for crisis intervention hotlines using explainable multi-task learning

Z Ding, Y Zhou, AJ Dai, C Qian, BL Zhong… - Journal of Affective …, 2025‏ - Elsevier
Abstract Background Crisis Intervention Hotline can effectively reduce suicide risk, but suffer
from low connectivity rates and untimely crisis response. By integrating speech signals and …

Probing speech emotion recognition transformers for linguistic knowledge

A Triantafyllopoulos, J Wagner, H Wierstorf… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Large, pre-trained neural networks consisting of self-attention layers (transformers) have
recently achieved state-of-the-art results on several speech emotion recognition (SER) …

Disentanglement of emotional style and speaker identity for expressive voice conversion

Z Du, B Sisman, K Zhou, H Li - arxiv preprint arxiv:2110.10326, 2021‏ - arxiv.org
Expressive voice conversion performs identity conversion for emotional speakers by jointly
converting speaker identity and emotional style. Due to the hierarchical structure of speech …

Hierarchical emotion prediction and control in text-to-speech synthesis

S Inoue, K Zhou, S Wang, H Li - ICASSP 2024-2024 IEEE …, 2024‏ - ieeexplore.ieee.org
It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS)
synthesis. Prior studies have primarily focused on learning a global prosodic representation …