Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset

K Zhou, B Sisman, R Liu, H Li - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Emotional voice conversion aims to transform emotional prosody in speech while preserving
the linguistic content and speaker identity. Prior studies show that it is possible to …

Stargan-vc: Non-parallel many-to-many voice conversion using star generative adversarial networks

H Kameoka, T Kaneko, K Tanaka… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
This paper proposes a method that allows non-parallel many-to-many voice conversion (VC)
by using a variant of a generative adversarial network (GAN) called StarGAN. Our method …

Biomimetic and flexible piezoelectric mobile acoustic sensors with multiresonant ultrathin structures for machine learning biometrics

HS Wang, SK Hong, JH Han, YH Jung, HK Jeong… - Science …, 2021 - science.org
Flexible resonant acoustic sensors have attracted substantial attention as an essential
component for intuitive human-machine interaction (HMI) in the future voice user interface …

Emotional voice conversion: Theory, databases and ESD

K Zhou, B Sisman, R Liu, H Li - Speech Communication, 2022 - Elsevier
In this paper, we first provide a review of the state-of-the-art emotional voice conversion
research, and the existing emotional speech databases. We then motivate the development …

Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for learning the map** from source to
target speech without relying on parallel data. This is an important task, but it has been …

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org
We propose a non-parallel voice-conversion (VC) method that can learn a map** from
source to target speech without relying on parallel data. The proposed method is particularly …

The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods

J Lorenzo-Trueba, J Yamagishi, T Toda, D Saito… - arxiv preprint arxiv …, 2018 - arxiv.org
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016
edition with the aim of providing a common framework for evaluating and comparing …

Vqmivc: Vector quantization and mutual information-based unsupervised speech representation disentanglement for one-shot voice conversion

D Wang, L Deng, YT Yeung, X Chen, X Liu… - arxiv preprint arxiv …, 2021 - arxiv.org
One-shot voice conversion (VC), which performs conversion across arbitrary speakers with
only a single target-speaker utterance for reference, can be effectively achieved by speech …