Pros and cons of GAN evaluation measures: New developments

A Borji - Computer Vision and Image Understanding, 2022 - Elsevier
This work is an update of my previous paper on the same topic published a few years ago
(Borji, 2019). With the dramatic progress in generative modeling, a suite of new quantitative …

Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Vasa-1: Lifelike audio-driven talking faces generated in real time

S Xu, G Chen, YX Guo, J Yang, C Li… - Advances in …, 2025 - proceedings.neurips.cc
We introduce VASA, a framework for generating lifelike talking faces with appealing visual
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …

Codetalker: Speech-driven 3d facial animation with discrete motion prior

J **ng, M **a, Y Zhang, X Cun… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arxiv preprint arxiv:2201.02184, 2022 - arxiv.org
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …

Pose-controllable talking face generation by implicitly modularized audio-visual representation

H Zhou, Y Sun, W Wu, CC Loy… - Proceedings of the …, 2021 - openaccess.thecvf.com
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …

Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset

Z Zhang, L Li, Y Ding, C Fan - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
One-shot talking face generation should synthesize high visual quality facial videos with
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …

A lip sync expert is all you need for speech to lip generation in the wild

KR Prajwal, R Mukhopadhyay, VP Namboodiri… - Proceedings of the 28th …, 2020 - dl.acm.org
In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary
identity to match a target speech segment. Current works excel at producing accurate lip …

Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

Expressive talking head generation with granular audio-visual control

B Liang, Y Pan, Z Guo, H Zhou… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …