An overview of affective speech synthesis and conversion in the deep learning era
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …
a core priority in human–computer interaction research. In recent years, machines have …
Emotional voice conversion: Theory, databases and ESD
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …
research, and the existing emotional speech databases. We then motivate the development …
[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …
extensively being harnessed across a diverse range of domains, eg, forensic science …
Textless speech emotion conversion using discrete and decomposed representations
Speech emotion conversion is the task of modifying the perceived emotion of a speech
utterance while preserving the lexical content and speaker identity. In this study, we cast the …
utterance while preserving the lexical content and speaker identity. In this study, we cast the …
Emotion intensity and its control for emotional voice conversion
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …
Copypaste: An augmentation method for speech emotion recognition
Data augmentation is a widely used strategy for training robust machine learning models. It
partially alleviates the problem of limited data for tasks like speech emotion recognition …
partially alleviates the problem of limited data for tasks like speech emotion recognition …
End-to-end modeling and transfer learning for audiovisual emotion recognition in-the-wild
As emotions play a central role in human communication, automatic emotion recognition has
attracted increasing attention in the last two decades. While multimodal systems enjoy high …
attracted increasing attention in the last two decades. While multimodal systems enjoy high …
Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition
In this paper, we explored how to boost speech emotion recognition (SER) with the state-of-
the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and …
the-art speech pre-trained model (PTM), data2vec, text generation technique, GPT-4, and …
Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training
Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …
Towards General-Purpose Text-Instruction-Guided Voice Conversion
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …