Synctalk: The devil is in the synchronization for talking head synthesis

Z Peng, W Hu, Y Shi, X Zhu, X Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Achieving high synchronization in the synthesis of realistic speech-driven talking head
videos presents a significant challenge. Traditional Generative Adversarial Networks (GAN) …

Facechain-imagineid: Freely crafting high-fidelity diverse talking faces from disentangled audio

C Xu, Y Liu, J **ng, W Wang, M Sun… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
In this paper we abstract the process of people hearing speech extracting meaningful cues
and creating various dynamically audio-consistent talking faces termed Listening and …

Flowvqtalker: High-quality emotional talking face generation through normalizing flow and quantization

S Tan, B Ji, Y Pan - … of the IEEE/CVF Conference on …, 2024‏ - openaccess.thecvf.com
Generating emotional talking faces is a practical yet challenging endeavor. To create a
lifelike avatar we draw upon two critical insights from a human perspective: 1) The …

Deepfake generation and detection: A benchmark and survey

G Pei, J Zhang, M Hu, Z Zhang, C Wang, Y Wu… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Deepfake is a technology dedicated to creating highly realistic facial images and videos
under specific conditions, which has significant application potential in fields such as …

Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution

Y **, B Lin, W Yan, Y Yuan, W Ye, RT Tan - Proceedings of the 31st …, 2023‏ - dl.acm.org
Visibility in hazy nighttime scenes is frequently reduced by multiple factors, including low
light, intense glow, light scattering, and the presence of multicolored light sources. Existing …

Edtalk: Efficient disentanglement for emotional talking head synthesis

S Tan, B Ji, M Bi, Y Pan - European Conference on Computer Vision, 2024‏ - Springer
Achieving disentangled control over multiple facial motions and accommodating diverse
input modalities greatly enhances the application and entertainment of the talking head …

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024‏ - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

Selftalk: A self-supervised commutative training diagram to comprehend 3d talking faces

Z Peng, Y Luo, Y Shi, H Xu, X Zhu, H Liu, J He… - Proceedings of the 31st …, 2023‏ - dl.acm.org
Speech-driven 3D face animation technique, extending its applications to various
multimedia fields. Previous research has generated promising realistic lip movements and …

AV-Deepfake1M: A large-scale LLM-driven audio-visual deepfake dataset

Z Cai, S Ghosh, AP Adatia, M Hayat, A Dhall… - Proceedings of the …, 2024‏ - dl.acm.org
The detection and localization of highly realistic deepfake audio-visual content are
challenging even for the most advanced state-of-the-art methods. While most of the research …

Vlogger: Multimodal diffusion for embodied avatar synthesis

E Corona, A Zanfir, EG Bazavan, N Kolotouros… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We propose VLOGGER, a method for audio-driven human video generation from a single
input image of a person, which builds on the success of recent generative diffusion models …