Champ: Controllable and consistent human image animation with 3d parametric guidance

S Zhu, JL Chen, Z Dai, Z Dong, Y Xu, X Cao… - … on Computer Vision, 2024 - Springer
In this study, we introduce a methodology for human image animation by leveraging a 3D
human parametric model within a latent diffusion framework to enhance shape alignment …

Animate anyone: Consistent and controllable image-to-video synthesis for character animation

L Hu - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Character Animation aims to generating character videos from still images through driving
signals. Currently diffusion models have become the mainstream in visual generation …

Grounded sam: Assembling open-world models for diverse visual tasks

T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …

Mofa-video: Controllable image animation via generative motion field adaptions in frozen image-to-video diffusion model

M Niu, X Cun, X Wang, Y Zhang, Y Shan… - European Conference on …, 2024 - Springer
We present MOFA-Video, an advanced controllable image animation method that generates
video from the given image using various additional controllable signals (such as human …

Physavatar: Learning the physics of dressed 3d avatars from visual observations

Y Zheng, Q Zhao, G Yang, W Yifan, D **ang… - … on Computer Vision, 2024 - Springer
Modeling and rendering photorealistic avatars is of crucial importance in many applications.
Existing methods that build a 3D avatar from visual observations, however, struggle to …

Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation

Y Ma, H Liu, H Wang, H Pan, Y He, J Yuan… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer
We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

Wear-any-way: Manipulable virtual try-on via sparse correspondence alignment

M Chen, X Chen, Z Zhai, C Ju, X Hong, J Lan… - … on Computer Vision, 2024 - Springer
This paper introduces a novel framework for virtual try-on, termed Wear-Any-Way. Different
from previous methods, Wear-Any-Way is a customizable solution. Besides generating high …

Neural interactive keypoint detection

J Yang, A Zeng, F Li, S Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
This work proposes an end-to-end neural interactive keypoint detection framework named
Click-Pose, which can significantly reduce more than 10 times labeling costs of 2D keypoint …

Mimicmotion: High-quality human motion video generation with confidence-aware pose guidance

Y Zhang, J Gu, LW Wang, H Wang, J Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, generative artificial intelligence has achieved significant advancements in
the field of image generation, spawning a variety of applications. However, video generation …