Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Seeing what you said: Talking face generation guided by a lip reading expert
Talking face generation, also known as speech-to-lip generation, reconstructs facial motions
concerning lips given coherent speech input. The previous studies revealed the importance …
concerning lips given coherent speech input. The previous studies revealed the importance …
Identity-preserving talking face generation with landmark and appearance priors
Generating talking face videos from audio attracts lots of research interest. A few person-
specific methods can generate vivid videos but require the target speaker's videos for …
specific methods can generate vivid videos but require the target speaker's videos for …
Stylesync: High-fidelity generalized and personalized lip sync in style-based generator
Despite recent advances in syncing lip movements with any audio waves, current methods
still struggle to balance generation quality and the model's generalization ability. Previous …
still struggle to balance generation quality and the model's generalization ability. Previous …
Emmn: Emotional motion memory network for audio-driven emotional talking face generation
Synthesizing expression is essential to create realistic talking faces. Previous works
consider expressions and mouth shapes as a whole and predict them solely from audio …
consider expressions and mouth shapes as a whole and predict them solely from audio …
Lipformer: High-fidelity and generalizable talking face generation with a pre-learned facial codebook
Generating a talking face video from the input audio sequence is a practical yet challenging
task. Most existing methods either fail to capture fine facial details or need to train a specific …
task. Most existing methods either fail to capture fine facial details or need to train a specific …
Videoretalking: Audio-based lip synchronization for talking head video editing in the wild
We present VideoReTalking, a new system to edit the faces of a real-world talking head
video according to input audio, producing a high-quality and lip-syncing output video even …
video according to input audio, producing a high-quality and lip-syncing output video even …
Edtalk: Efficient disentanglement for emotional talking head synthesis
Achieving disentangled control over multiple facial motions and accommodating diverse
input modalities greatly enhances the application and entertainment of the talking head …
input modalities greatly enhances the application and entertainment of the talking head …
Talking head generation with probabilistic audio-to-visual diffusion priors
We introduce a novel framework for one-shot audio-driven talking head generation. Unlike
prior works that require additional driving sources for controlled synthesis in a deterministic …
prior works that require additional driving sources for controlled synthesis in a deterministic …
Audio-synchronized visual animation
Current visual generation methods can produce high-quality videos guided by text prompts.
However, effectively controlling object dynamics remains a challenge. This work explores …
However, effectively controlling object dynamics remains a challenge. This work explores …