Follow your pose: Pose-guided text-to-video generation using pose-free videos

Y Ma, Y He, X Cun, X Wang, S Chen, X Li… - Proceedings of the AAAI …, 2024‏ - ojs.aaai.org
Generating text-editable and pose-controllable character videos have an imperious demand
in creating various digital human. Nevertheless, this task has been restricted by the absence …

Motionlcm: Real-time controllable motion generation via latent consistency model

W Dai, LH Chen, J Wang, J Liu, B Dai… - European Conference on …, 2024‏ - Springer
This work introduces MotionLCM, extending controllable motion generation to a real-time
level. Existing methods for spatial-temporal control in text-conditioned motion generation …

Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects

C He, K Li, Y Zhang, Y Zhang, Z Guo, X Li… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Camouflaged object detection (COD) is the challenging task of identifying camouflaged
objects visually blended into surroundings. Albeit achieving remarkable success, existing …

Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation

Y Ma, H Liu, H Wang, H Pan, Y He, J Yuan… - SIGGRAPH Asia 2024 …, 2024‏ - dl.acm.org
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …

Using human feedback to fine-tune diffusion models without any reward model

K Yang, J Tao, J Lyu, C Ge, J Chen… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Using reinforcement learning with human feedback (RLHF) has shown significant promise in
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …

Humantomato: Text-aligned whole-body motion generation

S Lu, LH Chen, A Zeng, J Lin, R Zhang, L Zhang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This work targets a novel text-driven whole-body motion generation task, which takes a
given textual description as input and aims at generating high-quality, diverse, and coherent …

Chain of generation: Multi-modal gesture synthesis via cascaded conditional control

Z Xu, Y Zhang, S Yang, R Li, X Li - … of the AAAI Conference on Artificial …, 2024‏ - ojs.aaai.org
This study aims to improve the generation of 3D gestures by utilizing multimodal information
from human speech. Previous studies have focused on incorporating additional modalities …

Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives

R Li, YX Zhang, Y Zhang, H Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Z Wang, J Jia, S Sun, H Wu, R Han… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Choreographers determine what the dances look like while cameramen determine the final
presentation of dances. Recently various methods and datasets have showcased the …

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation

J Liu, W Dai, C Wang, Y Cheng, Y Tang… - European Conference on …, 2024‏ - Springer
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …