Vasa-1: Lifelike audio-driven talking faces generated in real time

S Xu, G Chen, YX Guo, J Yang, C Li… - Advances in …, 2025‏ - proceedings.neurips.cc
We introduce VASA, a framework for generating lifelike talking faces with appealing visual
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …

Fsrt: Facial scene representation transformer for face reenactment from factorized appearance head-pose and facial expression features

A Rochow, M Schwarz… - Proceedings of the IEEE …, 2024‏ - openaccess.thecvf.com
The task of face reenactment is to transfer the head motion and facial expressions from a
driving video to the appearance of a source image which may be of a different person (cross …

Probabilistic speech-driven 3D facial motion synthesis: new benchmarks methods and applications

KD Yang, A Ranjan, JHR Chang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
We consider the task of animating 3D facial geometry from speech signal. Existing works are
primarily deterministic focusing on learning a one-to-one map** from speech signal to 3D …

From pixels to portraits: A comprehensive survey of talking head generation techniques and applications

SN Gowda, D Pandey, SN Gowda - arxiv preprint arxiv:2308.16041, 2023‏ - arxiv.org
Recent advancements in deep learning and computer vision have led to a surge of interest
in generating realistic talking heads. This paper presents a comprehensive survey of state-of …

Facecomposer: A unified model for versatile facial content creation

J Wang, K Zhao, Y Ma, S Zhang… - Advances in …, 2023‏ - proceedings.neurips.cc
This work presents FaceComposer, a unified generative model that accomplishes a variety
of facial content creation tasks, including text-conditioned face synthesis, text-guided face …

Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation

D Yaman, FI Eyiokur, L Bärmann… - … Of The IEEE/CVF …, 2024‏ - openaccess.thecvf.com
In the task of talking face generation the objective is to generate a face video with lips
synchronized to the corresponding audio while preserving visual details and identity …