A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
S3prl-vc: Open-source voice conversion framework with self-supervised speech representations
This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …
Nvc-net: End-to-end adversarial voice conversion
Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …
applications. The idea is to change the voice identity from one speaker into another while …
A comparative study of self-supervised speech representation based voice conversion
We present a large-scale comparative study of self-supervised speech representation (S3R)-
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …
MSM-VC: high-fidelity source style transfer for non-parallel voice conversion by multi-scale style modeling
In addition to conveying the linguistic content from source speech to converted speech,
maintaining the speaking style of source speech also plays an important role in the voice …
maintaining the speaking style of source speech also plays an important role in the voice …
Enriching source style transfer in recognition-synthesis based non-parallel voice conversion
Current voice conversion (VC) methods can successfully convert timbre of the audio. As
modeling source audio's prosody effectively is a challenging task, there are still limitations of …
modeling source audio's prosody effectively is a challenging task, there are still limitations of …
GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion
In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for
language-independent text-free voice conversion. We build on Glow-TTS, which provides an …
language-independent text-free voice conversion. We build on Glow-TTS, which provides an …
Efficient non-autoregressive gan voice conversion using vqwav2vec features and dynamic convolution
It was shown recently that a combination of ASR and TTS models yield highly competitive
performance on standard voice conversion tasks such as the Voice Conversion Challenge …
performance on standard voice conversion tasks such as the Voice Conversion Challenge …
Synthesis speech based data augmentation for low resource children ASR
Successful speech recognition for children requires large training data with sufficient
speaker variability. The collection of such a training database of children's voices is …
speaker variability. The collection of such a training database of children's voices is …
Optimization of cross-lingual voice conversion with linguistics losses to reduce foreign accents
Cross-lingual voice conversion (XVC) transforms the speaker identity of a source speaker to
that of a target speaker who speaks a different language. Due to the intrinsic differences …
that of a target speaker who speaks a different language. Due to the intrinsic differences …