Follow your pose: Pose-guided text-to-video generation using pose-free videos
Generating text-editable and pose-controllable character videos have an imperious demand
in creating various digital human. Nevertheless, this task has been restricted by the absence …
in creating various digital human. Nevertheless, this task has been restricted by the absence …
Motionlcm: Real-time controllable motion generation via latent consistency model
This work introduces MotionLCM, extending controllable motion generation to a real-time
level. Existing methods for spatial-temporal control in text-conditioned motion generation …
level. Existing methods for spatial-temporal control in text-conditioned motion generation …
Strategic preys make acute predators: Enhancing camouflaged object detectors by generating camouflaged objects
Camouflaged object detection (COD) is the challenging task of identifying camouflaged
objects visually blended into surroundings. Albeit achieving remarkable success, existing …
objects visually blended into surroundings. Albeit achieving remarkable success, existing …
Follow-your-emoji: Fine-controllable and expressive freestyle portrait animation
We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which
animates a reference portrait with target landmark sequences. The main challenge of portrait …
animates a reference portrait with target landmark sequences. The main challenge of portrait …
Using human feedback to fine-tune diffusion models without any reward model
Using reinforcement learning with human feedback (RLHF) has shown significant promise in
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …
fine-tuning diffusion models. Previous methods start by training a reward model that aligns …
Humantomato: Text-aligned whole-body motion generation
This work targets a novel text-driven whole-body motion generation task, which takes a
given textual description as input and aims at generating high-quality, diverse, and coherent …
given textual description as input and aims at generating high-quality, diverse, and coherent …
Chain of generation: Multi-modal gesture synthesis via cascaded conditional control
This study aims to improve the generation of 3D gestures by utilizing multimodal information
from human speech. Previous studies have focused on incorporating additional modalities …
from human speech. Previous studies have focused on incorporating additional modalities …
Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives
We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Choreographers determine what the dances look like while cameramen determine the final
presentation of dances. Recently various methods and datasets have showcased the …
presentation of dances. Recently various methods and datasets have showcased the …
Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …