Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

Deepfake generation and detection: A benchmark and survey

G Pei, J Zhang, M Hu, Z Zhang, C Wang, Y Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Deepfake is a technology dedicated to creating highly realistic facial images and videos
under specific conditions, which has significant application potential in fields such as …

Dae-talker: High fidelity speech-driven talking face generation with diffusion autoencoder

C Du, Q Chen, T He, X Tan, X Chen, K Yu… - Proceedings of the 31st …, 2023 - dl.acm.org
While recent research has made significant progress in speech-driven talking face
generation, the quality of the generated video still lags behind that of real recordings. One …

Facecomposer: A unified model for versatile facial content creation

J Wang, K Zhao, Y Ma, S Zhang… - Advances in …, 2024 - proceedings.neurips.cc
This work presents FaceComposer, a unified generative model that accomplishes a variety
of facial content creation tasks, including text-conditioned face synthesis, text-guided face …

Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Revisiting generalizability in deepfake detection: Improving metrics and stabilizing transfer

S Kamat, S Agarwal, T Darrell… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract" Generalizability" is seen as the hallmark quality of a good deepfake detection
model. However, standard out-of-domain evaluation datasets are very similar in form to the …

Gaia: Zero-shot talking avatar generation

T He, J Guo, R Yu, Y Wang, J Zhu, K An, L Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech
and a single portrait image. Previous methods have relied on domain-specific heuristics …

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arxiv preprint arxiv:2308.08849, 2023 - arxiv.org
Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …

Expressive talking avatars

Y Pan, S Tan, S Cheng, Q Lin, Z Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Stylized avatars are common virtual representations used in VR to support interaction and
communication between remote collaborators. However, explicit expressions are notoriously …

Speech-driven 3d face animation with composite and regional facial movements

H Wu, S Zhou, J Jia, J **ng, Q Wen, X Wen - Proceedings of the 31st …, 2023 - dl.acm.org
Speech-driven 3D face animation poses significant challenges due to the intricacy and
variability inherent in human facial movements. This paper emphasizes the importance of …