A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

S3prl-vc: Open-source voice conversion framework with self-supervised speech representations

WC Huang, SW Yang, T Hayashi… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based
on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech …

Nvc-net: End-to-end adversarial voice conversion

B Nguyen, F Cardinaux - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Voice conversion (VC) has gained increasing popularity in many speech synthesis
applications. The idea is to change the voice identity from one speaker into another while …

A comparative study of self-supervised speech representation based voice conversion

WC Huang, SW Yang, T Hayashi… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
We present a large-scale comparative study of self-supervised speech representation (S3R)-
based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive …

MSM-VC: high-fidelity source style transfer for non-parallel voice conversion by multi-scale style modeling

Z Wang, X Wang, Q **e, T Li, L **e… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
In addition to conveying the linguistic content from source speech to converted speech,
maintaining the speaking style of source speech also plays an important role in the voice …

Enriching source style transfer in recognition-synthesis based non-parallel voice conversion

Z Wang, X Zhou, F Yang, T Li, H Du, L **e… - arxiv preprint arxiv …, 2021 - arxiv.org
Current voice conversion (VC) methods can successfully convert timbre of the audio. As
modeling source audio's prosody effectively is a challenging task, there are still limitations of …

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

M Proszewska, G Beringer, D Sáez-Trigueros… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we propose GlowVC: a multilingual multi-speaker flow-based model for
language-independent text-free voice conversion. We build on Glow-TTS, which provides an …

Efficient non-autoregressive gan voice conversion using vqwav2vec features and dynamic convolution

M Chen, Y Zhou, H Huang, T Hain - arxiv preprint arxiv:2203.17172, 2022 - arxiv.org
It was shown recently that a combination of ASR and TTS models yield highly competitive
performance on standard voice conversion tasks such as the Voice Conversion Challenge …

Synthesis speech based data augmentation for low resource children ASR

V Kadyan, H Kathania, P Govil, M Kurimo - Speech and Computer: 23rd …, 2021 - Springer
Successful speech recognition for children requires large training data with sufficient
speaker variability. The collection of such a training database of children's voices is …

Optimization of cross-lingual voice conversion with linguistics losses to reduce foreign accents

Y Zhou, Z Wu, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Cross-lingual voice conversion (XVC) transforms the speaker identity of a source speaker to
that of a target speaker who speaks a different language. Due to the intrinsic differences …