Disentangled representation learning

X Wang, H Chen, Z Wu, W Zhu - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Disentangled Representation Learning (DRL) aims to learn a model capable of identifying
and disentangling the underlying factors hidden in the observable data in representation …

Vmc: Video motion customization using temporal attention adaption for text-to-video diffusion models

H Jeong, GY Park, JC Ye - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Text-to-video diffusion models have advanced video generation significantly. However
customizing these models to generate videos with tailored motions presents a substantial …

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …

Motionbooth: Motion-aware customized text-to-video generation

J Wu, X Li, Y Zeng, J Zhang, Q Zhou, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we present MotionBooth, an innovative framework designed for animating
customized subjects with precise control over both object and camera movements. By …

InstructVideo: instructing video diffusion models with human feedback

H Yuan, S Zhang, X Wang, Y Wei… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models have emerged as the de facto paradigm for video generation. However
their reliance on web-scale data of varied quality often yields results that are visually …

MC: Multi-concept Guidance for Customized Multi-concept Generation

J Jiang, Y Zhang, K Feng, X Wu, W Li, R Pei… - arxiv preprint arxiv …, 2024 - arxiv.org
Customized text-to-image generation, which synthesizes images based on user-specified
concepts, has made significant progress in handling individual concepts. However, when …

Disenstudio: Customized multi-subject text-to-video generation with disentangled spatial control

H Chen, X Wang, Y Zhang, Y Zhou, Z Zhang… - Proceedings of the …, 2024 - dl.acm.org
Generating customized content in videos has received increasing attention recently.
However, existing works primarily focus on customized text-to-video generation for single …

Magdiff: Multi-alignment diffusion for high-fidelity video generation and editing

H Zhao, T Lu, J Gu, X Zhang, Q Zheng, Z Wu… - … on Computer Vision, 2024 - Springer
The diffusion model is widely leveraged for either video generation or video editing. As each
field has its task-specific problems, it is difficult to merely develop a single diffusion for …

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

Videoassembler: Identity-consistent video generation with reference entities using diffusion model

H Zhao, T Lu, J Gu, X Zhang, Z Wu, H Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Identity-consistent video generation seeks to synthesize videos that are guided by both
textual prompts and reference images of entities. Current approaches typically utilize cross …