Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Human-computer interaction system: A survey of talking-head generation
Virtual human is widely employed in various industries, including personal assistance,
intelligent customer service, and online education, thanks to the rapid development of …
intelligent customer service, and online education, thanks to the rapid development of …
Humangaussian: Text-driven 3d human generation with gaussian splatting
Realistic 3D human generation from text prompts is a desirable yet challenging task.
Existing methods optimize 3D representations like mesh or neural fields via score distillation …
Existing methods optimize 3D representations like mesh or neural fields via score distillation …
Gaussian head avatar: Ultra high-fidelity head avatar via dynamic gaussians
Creating high-fidelity 3D head avatars has always been a research hotspot but there
remains a great challenge under lightweight sparse view setups. In this paper we propose …
remains a great challenge under lightweight sparse view setups. In this paper we propose …
Codetalker: Speech-driven 3d facial animation with discrete motion prior
Speech-driven 3D facial animation has been widely studied, yet there is still a gap to
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
achieving realism and vividness due to the highly ill-posed nature and scarcity of audio …
Expressive talking head generation with granular audio-visual control
Generating expressive talking heads is essential for creating virtual humans. However,
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
existing one-or few-shot methods focus on lip-sync and head motion, ignoring the emotional …
Reconstructing personalized semantic facial nerf models from monocular video
We present a novel semantic model for human head defined with neural radiance field. The
3D-consistent head model consist of a set of disentangled and interpretable bases, and can …
3D-consistent head model consist of a set of disentangled and interpretable bases, and can …
Difftalk: Crafting diffusion models for generalized audio-driven portraits animation
Talking head synthesis is a promising approach for the video production industry. Recently,
a lot of effort has been devoted in this research area to improve the generation quality or …
a lot of effort has been devoted in this research area to improve the generation quality or …
Identity-preserving talking face generation with landmark and appearance priors
Generating talking face videos from audio attracts lots of research interest. A few person-
specific methods can generate vivid videos but require the target speaker's videos for …
specific methods can generate vivid videos but require the target speaker's videos for …
Learning hierarchical cross-modal association for co-speech gesture generation
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …