Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Seeing what you said: Talking face generation guided by a lip reading expert

J Wang, X Qian, M Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …

Identity-preserving talking face generation with landmark and appearance priors

W Zhong, C Fang, Y Cai, P Wei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating talking face videos from audio attracts lots of research interest. A few person-
specific methods can generate vivid videos but require the target speaker's videos for …

Stylesync: High-fidelity generalized and personalized lip sync in style-based generator

J Guan, Z Zhang, H Zhou, T Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …

Emmn: Emotional motion memory network for audio-driven emotional talking face generation

S Tan, B Ji, Y Pan - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Synthesizing expression is essential to create realistic talking faces. Previous works
consider expressions and mouth shapes as a whole and predict them solely from audio …

Lipformer: High-fidelity and generalizable talking face generation with a pre-learned facial codebook

J Wang, K Zhao, S Zhang, Y Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a talking face video from the input audio sequence is a practical yet challenging
task. Most existing methods either fail to capture fine facial details or need to train a specific …

Videoretalking: Audio-based lip synchronization for talking head video editing in the wild

K Cheng, X Cun, Y Zhang, M **a, F Yin, M Zhu… - SIGGRAPH Asia 2022 …, 2022 - dl.acm.org
We present VideoReTalking, a new system to edit the faces of a real-world talking head
video according to input audio, producing a high-quality and lip-syncing output video even …

Edtalk: Efficient disentanglement for emotional talking head synthesis

S Tan, B Ji, M Bi, Y Pan - European Conference on Computer Vision, 2024 - Springer
Achieving disentangled control over multiple facial motions and accommodating diverse
input modalities greatly enhances the application and entertainment of the talking head …

Talking head generation with probabilistic audio-to-visual diffusion priors

Z Yu, Z Yin, D Zhou, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce a novel framework for one-shot audio-driven talking head generation. Unlike
prior works that require additional driving sources for controlled synthesis in a deterministic …

Audio-synchronized visual animation

L Zhang, S Mo, Y Zhang, P Morgado - European Conference on Computer …, 2024 - Springer
Current visual generation methods can produce high-quality videos guided by text prompts.
However, effectively controlling object dynamics remains a challenge. This work explores …