Pros and cons of GAN evaluation measures: New developments
A Borji - Computer Vision and Image Understanding, 2022 - Elsevier
This work is an update of my previous paper on the same topic published a few years ago
(Borji, 2019). With the dramatic progress in generative modeling, a suite of new quantitative …
(Borji, 2019). With the dramatic progress in generative modeling, a suite of new quantitative …
Deep audio-visual learning: A survey
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …
modalities, has drawn considerable attention since deep learning started to be used …
Vasa-1: Lifelike audio-driven talking faces generated in real time
We introduce VASA, a framework for generating lifelike talking faces with appealing visual
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …
affective skills (VAS) given a single static image and a speech audio clip. Our premiere …
Codetalker: Speech-driven 3d facial animation with discrete motion prior
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
Learning audio-visual speech representation by masked multimodal cluster prediction
Video recordings of speech contain correlated audio and visual information, providing a
strong signal for speech representation learning from the speaker's lip movements and the …
strong signal for speech representation learning from the speaker's lip movements and the …
Pose-controllable talking face generation by implicitly modularized audio-visual representation
While accurate lip synchronization has been achieved for arbitrary-subject audio-driven
talking face generation, the problem of how to efficiently drive the head pose remains …
talking face generation, the problem of how to efficiently drive the head pose remains …
Flow-guided one-shot talking face generation with a high-resolution audio-visual dataset
One-shot talking face generation should synthesize high visual quality facial videos with
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …
reasonable animations of expression and head pose, and just utilize arbitrary driving audio …
A lip sync expert is all you need for speech to lip generation in the wild
In this work, we investigate the problem of lip-syncing a talking face video of an arbitrary
identity to match a target speech segment. Current works excel at producing accurate lip …
identity to match a target speech segment. Current works excel at producing accurate lip …
Stylesync: High-fidelity generalized and personalized lip sync in style-based generator
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …
still struggle to balance generation quality and the model's generalization ability. Previous …
Expressive talking head generation with granular audio-visual control
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …