An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Bangla natural language processing: A comprehensive analysis of classical, machine learning, and deep learning-based methods

O Sen, M Fuad, MN Islam, J Rabbi, M Masud… - IEEE …, 2022 - ieeexplore.ieee.org
The Bangla language is the seventh most spoken language, with 265 million native and non-
native speakers worldwide. However, English is the predominant language for online …

One-shot voice conversion by separating speaker and content representations with instance normalization

J Chou, C Yeh, H Lee - arxiv preprint arxiv:1904.05742, 2019 - arxiv.org
Recently, voice conversion (VC) without parallel data has been successfully adapted to multi-
target scenario in which a single model is trained to convert the input voice to many different …

Sequence-to-sequence acoustic modeling for voice conversion

JX Zhang, ZH Ling, LJ Liu, Y Jiang… - IEEE/ACM Transactions …, 2019 - ieeexplore.ieee.org
In this paper, a neural network named sequence-to-sequence ConvErsion NeTwork
(SCENT) is presented for acoustic modeling in voice conversion. At training stage, a SCENT …

JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis

R Sonobe, S Takamichi, H Saruwatari - arxiv preprint arxiv:1711.00354, 2017 - arxiv.org
Thanks to improvements in machine learning techniques including deep learning, a free
large-scale speech corpus that can be shared between academic institutions and …

Non-parallel sequence-to-sequence voice conversion with disentangled linguistic and speaker representations

JX Zhang, ZH Ling, LR Dai - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
This article presents a method of sequence-to-sequence (seq2seq) voice conversion using
non-parallel training data. In this method, disentangled linguistic and speaker …

Multi-target voice conversion without parallel data by adversarially learning disentangled audio representations

J Chou, C Yeh, H Lee, L Lee - arxiv preprint arxiv:1804.02812, 2018 - arxiv.org
Recently, cycle-consistent adversarial network (Cycle-GAN) has been successfully applied
to voice conversion to a different speaker without parallel data, although in those …

Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining

WC Huang, T Hayashi, YC Wu, H Kameoka… - arxiv preprint arxiv …, 2019 - arxiv.org
We introduce a novel sequence-to-sequence (seq2seq) voice conversion (VC) model based
on the Transformer architecture with text-to-speech (TTS) pretraining. Seq2seq VC models …

AttS2S-VC: Sequence-to-sequence voice conversion with attention and context preservation mechanisms

K Tanaka, H Kameoka, T Kaneko… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with
attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq …

Introduction to voice presentation attack detection and recent advances

M Sahidullah, H Delgado, M Todisco, A Nautsch… - Handbook of Biometric …, 2023 - Springer
Over the past few years, significant progress has been made in the field of presentation
attack detection (PAD) for automatic speaker recognition (ASV). This includes the …