Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Application of a 3D Talking Head as Part of Telecommunication AR, VR, MR System: Systematic Review

N Christoff, NN Neshov, K Tonchev, A Manolova - Electronics, 2023 - mdpi.com
In today's digital era, the realms of virtual reality (VR), augmented reality (AR), and mixed
reality (MR) collectively referred to as extended reality (XR) are resha** human–computer …

Gaussiantalker: Speaker-specific talking head synthesis via 3d gaussian splatting

H Yu, Z Qu, Q Yu, J Chen, Z Jiang, Z Chen… - Proceedings of the …, 2024 - dl.acm.org
Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF)
have achieved impressive results. However, due to inadequate pose and expression control …

Gaussiantalker: Real-time talking head synthesis with 3d gaussian splatting

K Cho, J Lee, H Yoon, Y Hong, J Ko, S Ahn… - Proceedings of the 32nd …, 2024 - dl.acm.org
This paper proposes GaussianTalker, a novel framework for real-time generation of pose-
controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian …

Real3d-portrait: One-shot realistic 3d talking portrait synthesis

Z Ye, T Zhong, Y Ren, J Yang, W Li, J Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen
image, and then animate it with a reference video or audio to generate a talking portrait …

Recent advances in implicit representation-based 3d shape generation

JM Sun, T Wu, L Gao - Visual Intelligence, 2024 - Springer
Various techniques have been developed and introduced to address the pressing need to
create three-dimensional (3D) content for advanced applications such as virtual reality and …

Hyperlips: Hyper control lips with high resolution decoder for talking face generation

Y Chen, Y Yao, Z Li, W Wang, Y Zhang, H Yang… - Applied …, 2025 - Springer
Talking face generation has a wide range of potential applications in the field of virtual
digital humans. However, rendering high-fidelity facial video while ensuring lip …

Ragdiffusion: Faithful cloth generation via external knowledge assimilation

X Tan, Y Li, W Shang, Y Wu, J Wang, X Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Standard clothing asset generation involves creating forward-facing flat-lay garment images
displayed on a clear background by extracting clothing information from diverse real-world …

Dreamhead: Learning spatial-temporal correspondence via hierarchical diffusion for audio-driven talking head synthesis

FT Hong, Y Liu, Y Li, C Zhou, F Yu, D Xu - arxiv preprint arxiv:2409.10281, 2024 - arxiv.org
Audio-driven talking head synthesis strives to generate lifelike video portraits from provided
audio. The diffusion model, recognized for its superior quality and robust generalization, has …

Aniportrait: Audio-driven synthesis of photorealistic portrait animation

H Wei, Z Yang, Z Wang - arxiv preprint arxiv:2403.17694, 2024 - arxiv.org
In this study, we propose AniPortrait, a novel framework for generating high-quality
animation driven by audio and a reference portrait image. Our methodology is divided into …