Disentangled representation learning
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying
and disentangling the underlying factors hidden in the observable data in representation …
and disentangling the underlying factors hidden in the observable data in representation …
Vmc: Video motion customization using temporal attention adaption for text-to-video diffusion models
Text-to-video diffusion models have advanced video generation significantly. However
customizing these models to generate videos with tailored motions presents a substantial …
customizing these models to generate videos with tailored motions presents a substantial …
Videobooth: Diffusion-based video generation with image prompts
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …
is not enough to depict the desired subject appearance that accurately aligns with users' …
Motionbooth: Motion-aware customized text-to-video generation
In this work, we present MotionBooth, an innovative framework designed for animating
customized subjects with precise control over both object and camera movements. By …
customized subjects with precise control over both object and camera movements. By …
InstructVideo: instructing video diffusion models with human feedback
Diffusion models have emerged as the de facto paradigm for video generation. However
their reliance on web-scale data of varied quality often yields results that are visually …
their reliance on web-scale data of varied quality often yields results that are visually …
MC: Multi-concept Guidance for Customized Multi-concept Generation
Customized text-to-image generation, which synthesizes images based on user-specified
concepts, has made significant progress in handling individual concepts. However, when …
concepts, has made significant progress in handling individual concepts. However, when …
Disenstudio: Customized multi-subject text-to-video generation with disentangled spatial control
Generating customized content in videos has received increasing attention recently.
However, existing works primarily focus on customized text-to-video generation for single …
However, existing works primarily focus on customized text-to-video generation for single …
Magdiff: Multi-alignment diffusion for high-fidelity video generation and editing
The diffusion model is widely leveraged for either video generation or video editing. As each
field has its task-specific problems, it is difficult to merely develop a single diffusion for …
field has its task-specific problems, it is difficult to merely develop a single diffusion for …
Multi-modal generative ai: Multi-modal llm, diffusion and beyond
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Particularly, two dominant families of techniques are: i) The multi-modal large language …
Videoassembler: Identity-consistent video generation with reference entities using diffusion model
Identity-consistent video generation seeks to synthesize videos that are guided by both
textual prompts and reference images of entities. Current approaches typically utilize cross …
textual prompts and reference images of entities. Current approaches typically utilize cross …