A review on generative adversarial networks: Algorithms, theory, and applications

J Gui, Z Sun, Y Wen, D Tao, J Ye - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Generative adversarial networks (GANs) have recently become a hot research topic;
however, they have been studied since 2014, and a large number of algorithms have been …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Autovc: Zero-shot voice style transfer with only autoencoder loss

K Qian, Y Zhang, S Chang, X Yang… - International …, 2019 - proceedings.mlr.press
Despite the progress in voice conversion, many-to-many voice conversion trained on non-
parallel data, as well as zero-shot voice conversion, remains under-explored. Deep style …

Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks

H Kameoka, T Kaneko, K Tanaka… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC)
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for learning the map** from source to
target speech without relying on parallel data. This is an important task, but it has been …

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org
We propose a non-parallel voice-conversion (VC) method that can learn a map** from
source to target speech without relying on parallel data. The proposed method is particularly …

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arxiv preprint arxiv …, 2021 - arxiv.org
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …

One-shot voice conversion by separating speaker and content representations with instance normalization

J Chou, C Yeh, H Lee - arxiv preprint arxiv:1904.05742, 2019 - arxiv.org
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-
target scenario in which a single model is trained to convert the input voice to many different …

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press
Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …