An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Unsupervised learning of disentangled and interpretable representations from sequential data

WN Hsu, Y Zhang, J Glass - Advances in neural information …, 2017 - proceedings.neurips.cc
We present a factorized hierarchical variational autoencoder, which learns disentangled and
interpretable representations from sequential data without supervision. Specifically, we …

One-shot voice conversion by separating speaker and content representations with instance normalization

J Chou, C Yeh, H Lee - arxiv preprint arxiv:1904.05742, 2019 - arxiv.org
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-
target scenario in which a single model is trained to convert the input voice to many different …