Deep learning-based expressive speech synthesis: a systematic review of approaches, challenges, and resources

H Barakat, O Turk, C Demiroglu - EURASIP Journal on Audio, Speech, and …, 2024 - Springer
Speech synthesis has made significant strides thanks to the transition from machine learning
to deep learning models. Contemporary text-to-speech (TTS) models possess the capability …

Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data

M Łajszczak, G Cámbara, Y Li, F Beyhan… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf {B} $
ig $\textbf {A} $ daptive $\textbf {S} $ treamable TTS with $\textbf {E} $ mergent abilities …

Not my voice! a taxonomy of ethical and safety harms of speech generators

W Hutiri, O Papakyriakopoulos, A **ang - Proceedings of the 2024 ACM …, 2024 - dl.acm.org
The rapid and wide-scale adoption of AI to generate human speech poses a range of
significant ethical and safety risks to society that need to be addressed. For example, a …

Slim: Style-linguistics mismatch model for generalized audio deepfake detection

Y Zhu, S Koppisetti, T Tran… - Advances in Neural …, 2025 - proceedings.neurips.cc
Audio deepfake detection (ADD) is crucial to combat the misuse of speech synthesized by
generative AI models. Existing ADD models suffer from generalization issues to unseen …

Beyond Deep Learning: Charting the Next Frontiers of Affective Computing

A Triantafyllopoulos, L Christ, A Gebhard… - Intelligent …, 2024 - spj.science.org
Affective computing (AC), like most other areas of computational research, has benefited
tremendously from advances in deep learning (DL). These advances have opened up new …

Improved dendritic learning: Activation function analysis

Y Wang, Y Yu, T Zhang, K Song, Y Wang, S Gao - Information Sciences, 2024 - Elsevier
This study conducted a thorough evaluation of an improved dendritic learning (DL)
framework, focusing specifically on its application in power load forecasting. The objective …

Hierarchical emotion prediction and control in text-to-speech synthesis

S Inoue, K Zhou, S Wang, H Li - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
It remains a challenge to effectively control the emotion rendering in text-to-speech (TTS)
synthesis. Prior studies have primarily focused on learning a global prosodic representation …

Mdrt: Multi-domain synthetic speech localization

AKS Yadav, K Bhagtani, S Baireddy… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
With recent advancements in generating synthetic speech, tools to generate high-quality
synthetic speech impersonating any human speaker are easily available. Several incidents …

Expressivity and speech synthesis

A Triantafyllopoulos, BW Schuller - arxiv preprint arxiv:2404.19363, 2024 - arxiv.org
Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence
(AI) research. From the very beginning, the community has not only aimed to synthesise high …

Emotional dimension control in language model-based text-to-speech: Spanning a broad spectrum of human emotions

K Zhou, Y Zhang, S Zhao, H Wang, Z Pan, D Ng… - arxiv preprint arxiv …, 2024 - arxiv.org
Current emotional text-to-speech (TTS) systems face challenges in mimicking a broad
spectrum of human emotions due to the inherent complexity of emotions and limitations in …