An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Emotional voice conversion: Theory, databases and esd

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset

K Zhou, B Sisman, R Liu, H Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Emotional voice conversion aims to transform emotional prosody in speech while preserving
the linguistic content and speaker identity. Prior studies show that it is possible to …

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

Transforming spectrum and prosody for emotional voice conversion with non-parallel training data

K Zhou, B Sisman, H Li - arxiv preprint arxiv:2002.00198, 2020 - arxiv.org
Emotional voice conversion aims to convert the spectrum and prosody to change the
emotional patterns of speech, while preserving the speaker identity and linguistic content …

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

Converting anyone's emotion: Towards speaker-independent emotional voice conversion

K Zhou, B Sisman, M Zhang, H Li - arxiv preprint arxiv:2005.07025, 2020 - arxiv.org
Emotional voice conversion aims to convert the emotion of speech from one state to another
while preserving the linguistic content and speaker identity. The prior studies on emotional …

Modified magnitude-phase spectrum information for spoofing detection

J Yang, H Wang, RK Das, Y Qian - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
Most of the existing feature representations for spoofing countermeasures consider
information either from the magnitude or phase spectrum. We hypothesize that both …

Teacher-student training for robust tacotron-based tts

R Liu, B Sisman, J Li, F Bao, G Gao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
While neural end-to-end text-to-speech (TTS) is superior to conventional statistical methods
in many ways, the exposure bias problem in the autoregressive models remains an issue to …

Limited data emotional voice conversion leveraging text-to-speech: Two-stage sequence-to-sequence training

K Zhou, B Sisman, H Li - arxiv preprint arxiv:2103.16809, 2021 - arxiv.org
Emotional voice conversion (EVC) aims to change the emotional state of an utterance while
preserving the linguistic content and speaker identity. In this paper, we propose a novel 2 …