Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023‏ - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Usat: A universal speaker-adaptive text-to-speech approach

W Wang, Y Song, S Jha - IEEE/ACM Transactions on Audio …, 2024‏ - ieeexplore.ieee.org
Conventional text-to-speech (TTS) research has predominantly focused on enhancing the
quality of synthesized speech for speakers in the training dataset. The challenge of …

Tdass: Target domain adaptation speech synthesis framework for multi-speaker low-resource tts

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022‏ - ieeexplore.ieee.org
Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly
demanded. But the previous TTS models require a mass of target speaker speeches for …

Metasid: Singer identification with domain adaptation for metaverse

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022‏ - ieeexplore.ieee.org
Metaverse has stretched the real world into unlimited space. There will be more live concerts
in Metaverse. The task of singer identification is to identify the song belongs to which singer …

Adaptive transformer-based conditioned variational autoencoder for incomplete social event classification

Z Li, S Qian, J Cao, Q Fang, C Xu - Proceedings of the 30th ACM …, 2022‏ - dl.acm.org
With the rapid development of the Internet and the expanding scale of social media,
incomplete social event classification has increasingly become a challenging task. The key …

Susing: Su-net for singing voice synthesis

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022‏ - ieeexplore.ieee.org
Singing voice synthesis is a generative task that involves multi-dimensional control of the
singing model, including lyrics, pitch, and duration, and includes the timbre of the singer and …

[PDF][PDF] Fvtts: Face based voice synthesis for text-to-speech

M Lee, E Park, S Hong - Proc. Interspeech 2024, 2024‏ - isca-archive.org
A face is expressive of individual identity and used in various studies such as identification,
authentication, and personalization. Similarly, a voice is a means of expressing individuals …

Pose guided human image synthesis with partially decoupled gan

J Wu, S Si, J Wang, X Qu, X **g - Asian Conference on …, 2023‏ - proceedings.mlr.press
Abstract Pose Guided Human Image Synthesis (PGHIS) is a challenging task of transforming
a human image from the reference pose to a target pose while preserving its style. Most …

Semi-supervised learning based on reference model for low-resource tts

X Zhang, J Wang, N Cheng… - 2022 18th International …, 2022‏ - ieeexplore.ieee.org
Most previous neural text-to-speech (TTS) methods are mainly based on supervised
learning methods, which means they depend on a large training dataset and hard to achieve …

Mdcnn-sid: Multi-scale dilated convolution network for singer identification

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022‏ - ieeexplore.ieee.org
Most singer identification methods are processed in the frequency domain, which potentially
leads to information loss during the spectral transformation. In this paper, instead of the …