Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Visinger 2: High-fidelity end-to-end singing voice synthesis enhanced by digital signal processing synthesizer

Y Zhang, H Xue, H Li, L **e, T Guo, R Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
End-to-end singing voice synthesis (SVS) model VISinger can achieve better performance
than the typical two-stage model with fewer parameters. However, VISinger has several …

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

Converting foreign accent speech without a reference

G Zhao, S Ding… - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
Foreign accent conversion (FAC) is the problem of generating a synthetic voice that has the
voice identity of a second-language (L2) learner and the pronunciation patterns of a native …

Prompt-singer: Controllable singing-voice-synthesis with natural language prompt

Y Wang, R Hu, R Huang, Z Hong, R Li, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent singing-voice-synthesis (SVS) methods have achieved remarkable audio quality and
naturalness, yet they lack the capability to control the style attributes of the synthesized …

[PDF][PDF] Data augmentation for children ASR and child-adult speaker classification using voice conversion methods

Z Shuyang, M Singh, A Woubie, R Karhila - Proc. Interspeech, 2023 - isca-archive.org
Many young children prefer speech based interfaces over text, as they are relatively slow
and error-prone with text input. However, children ASR can be challenging due to the lack of …

Acoustic tracking of pitch, modal, and subharmonic vibrations of vocal folds in Parkinson's disease and parkinsonism

J Hlavnička, R Čmejla, J Klempíř, E Růžička… - IEEE Access, 2019 - ieeexplore.ieee.org
The prominent and early presence of dysphonia is considered a valuable marker for
differentiation of idiopathic Parkinson's disease and parkinsonian syndromes. Objective …

[PDF][PDF] Speech synthesis from articulatory movements recorded by real-time MRI

Y Otani, S Sawada, H Ohmura, K Katsurada - Proc. Interspeech, 2023 - isca-archive.org
Previous speech synthesis models from articulatory movements recorded using real-time
MRI (rtMRI) only predicted vocal tract shape parameters and required additional pitch …

A comparative study of voice conversion models with large-scale speech and singing data: The T13 systems for the singing voice conversion challenge 2023

R Yamamoto, R Yoneyama, LP Violeta… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper presents our systems (denoted as T13) for the singing voice conversion
challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice …