Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Deep stable learning for out-of-distribution generalization

X Zhang, P Cui, R Xu, L Zhou… - Proceedings of the …, 2021 - openaccess.thecvf.com
Approaches based on deep neural networks have achieved striking performance when
testing data and training data share similar distribution, but can significantly fail otherwise …

Unsupervised speech decomposition via triple information bottleneck

K Qian, Y Zhang, S Chang… - International …, 2020 - proceedings.mlr.press
Speech information can be roughly decomposed into four components: language content,
timbre, pitch, and rhythm. Obtaining disentangled representations of these components is …

Starganv2-vc: A diverse, unsupervised, non-parallel framework for natural-sounding voice conversion

YA Li, A Zare, N Mesgarani - arxiv preprint arxiv:2107.10394, 2021 - arxiv.org
We present an unsupervised non-parallel many-to-many voice conversion (VC) method
using a generative adversarial network (GAN) called StarGAN v2. Using a combination of …

Audio deepfake approaches

OA Shaaban, R Yildirim, AA Alguttar - IEEE Access, 2023 - ieeexplore.ieee.org
This paper presents a review of techniques involved in the creation and detection of audio
deepfakes, the first section provides information about general deep fakes. In the second …

Privacy-preserving voice analysis via disentangled representations

R Aloufi, H Haddadi, D Boyle - Proceedings of the 2020 ACM SIGSAC …, 2020 - dl.acm.org
Voice User Interfaces (VUIs) are increasingly popular and built into smartphones, home
assistants, and Internet of Things (IoT) devices. Despite offering an always-on convenient …

Anonymizing speech: Evaluating and designing speaker anonymization techniques

P Champion - arxiv preprint arxiv:2308.04455, 2023 - arxiv.org
The growing use of voice user interfaces has led to a surge in the collection and storage of
speech data. While data collection allows for the development of efficient tools powering …

Global prosody style transfer without text transcriptions

K Qian, Y Zhang, S Chang, J **ong… - International …, 2021 - proceedings.mlr.press
Prosody plays an important role in characterizing the style of a speaker or an emotion, but
most non-parallel voice or emotion style transfer algorithms do not convert any prosody …

The sequence-to-sequence baseline for the voice conversion challenge 2020: Cascading asr and tts

WC Huang, T Hayashi, S Watanabe, T Toda - arxiv preprint arxiv …, 2020 - arxiv.org
This paper presents the sequence-to-sequence (seq2seq) baseline system for the voice
conversion challenge (VCC) 2020. We consider a naive approach for voice conversion (VC) …