Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks

CC Hsu, HT Hwang, YC Wu, Y Tsao… - arxiv preprint arxiv …, 2017 - arxiv.org
Building a voice conversion (VC) system from non-parallel speech corpora is challenging
but highly valuable in real application scenarios. In most situations, the source and the target …

Voice conversion from non-parallel corpora using variational auto-encoder

CC Hsu, HT Hwang, YC Wu, Y Tsao… - 2016 Asia-Pacific …, 2016 - ieeexplore.ieee.org
We propose a flexible framework for spectral conversion (SC) that facilitates training with
unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or …

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

F Fang, J Yamagishi, I Echizen… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Although voice conversion (VC) algorithms have achieved remarkable success along with
the development of machine learning, superior performance is still difficult to achieve when …

Non-parallel voice conversion with cyclic variational autoencoder

PL Tobing, YC Wu, T Hayashi, K Kobayashi… - arxiv preprint arxiv …, 2019 - arxiv.org
In this paper, we present a novel technique for a non-parallel voice conversion (VC) with the
use of cyclic variational autoencoder (CycleVAE)-based spectral modeling. In a variational …

Catch you and i can: Revealing source voiceprint against voice conversion

J Deng, Y Chen, Y Zhong, Q Miao, X Gong… - 32nd USENIX Security …, 2023 - usenix.org
Voice conversion (VC) techniques can be abused by malicious parties to transform their
audios to sound like a target speaker, making it hard for a human being or a speaker …

Non-parallel training in voice conversion using an adaptive restricted Boltzmann machine

T Nakashika, T Takiguchi… - IEEE/ACM Transactions …, 2016 - ieeexplore.ieee.org
In this paper, we present a voice conversion (VC) method that does not use any parallel data
while training the model. VC is a technique where only speaker-specific information in …

[PDF][PDF] One-Shot Voice Conversion with Disentangled Representations by Leveraging Phonetic Posteriorgrams.

SH Mohammadi, T Kim - Interspeech, 2019 - isca-archive.org
We propose voice conversion model from arbitrary source speaker to arbitrary target
speaker with disentangled representations. Voice conversion is a task to convert the voice of …

Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion

SH Mohammadi, T Kim - arxiv preprint arxiv:1808.05294, 2018 - arxiv.org
We study the problem of cross-lingual voice conversion in non-parallel speech corpora and
one-shot learning setting. Most prior work require either parallel speech corpora or enough …

Speech synthesis from found data

P Baljekar - 2018 - kilthub.cmu.edu
Text-to-speech synthesis (TTS) has progressed to such a stage that given a large, clean,
phonetically balanced dataset from a single speaker, it can produce intelligible, almost …

Many-to-many unsupervised speech conversion from nonparallel corpora

YK Lee, HW Kim, JG Park - IEEE Access, 2021 - ieeexplore.ieee.org
We address a nonparallel data-driven many-to-many speech modeling and multimodal style
conversion method. In this work, we train a speech conversion model for multiple domains …