Dreamtalk: When expressive talking head generation meets diffusion probabilistic models

Y Ma, S Zhang, J Wang, X Wang, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Diffusion models have shown remarkable success in a variety of downstream generative
tasks, yet remain under-explored in the important and challenging expressive talking head …

Facediffuser: Speech-driven 3d facial animation synthesis using diffusion

S Stan, KI Haque, Z Yumak - Proceedings of the 16th ACM SIGGRAPH …, 2023 - dl.acm.org
Speech-driven 3D facial animation synthesis has been a challenging task both in industry
and research. Recent methods mostly focus on deterministic deep learning methods …

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

C Xu, Y Liu, J **ng, W Wang, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we abstract the process of people hearing speech extracting meaningful cues
and creating various dynamically audio-consistent talking faces termed Listening and …

Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges

H Liz-Lopez, M Keita, A Taleb-Ahmed, A Hadid… - Information …, 2024 - Elsevier
Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …

Anitalker: animate vivid and diverse talking faces through identity-decoupled facial motion encoding

T Liu, F Chen, S Fan, C Du, Q Chen, X Chen… - Proceedings of the 32nd …, 2024 - dl.acm.org
The paper introduces AniTalker, an innovative framework designed to generate lifelike
talking faces from a single portrait. Unlike existing models that primarily focus on verbal cues …

Vasa-1: Lifelike audio-driven talking faces generated in real time

S Xu, G Chen, YX Guo, J Yang, C Li, Z Zang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce VASA, a framework for generating lifelike talking faces with appealing visual
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …

Gaia: Zero-shot talking avatar generation

T He, J Guo, R Yu, Y Wang, J Zhu, K An, L Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Zero-shot talking avatar generation aims at synthesizing natural talking videos from speech
and a single portrait image. Previous methods have relied on domain-specific heuristics …

Make your actor talk: Generalizable and high-fidelity lip sync with motion and appearance disentanglement

R Yu, T He, A Zhang, Y Wang, J Guo, X Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
We aim to edit the lip movements in talking video according to the given speech while
preserving the personal identity and visual details. The task can be decomposed into two …