Freeu: Free lunch in diffusion u-net

C Si, Z Huang, Y Jiang, Z Liu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In this paper we uncover the untapped potential of diffusion U-Net which serves as a" free
lunch" that substantially improves the generation quality on the fly. We initially investigate …

Motionbooth: Motion-aware customized text-to-video generation

J Wu, X Li, Y Zeng, J Zhang, Q Zhou… - Advances in …, 2025 - proceedings.neurips.cc
In this work, we present MotionBooth, an innovative framework designed for animating
customized subjects with precise control over both object and camera movements. By …

Id-animator: Zero-shot identity-preserving human video generation

X He, Q Liu, S Qian, X Wang, T Hu, K Cao… - arxiv preprint arxiv …, 2024 - arxiv.org
Generating high-fidelity human video with specified identities has attracted significant
attention in the content generation community. However, existing techniques struggle to …

Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arxiv preprint arxiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Customvideo: Customizing text-to-video generation with multiple subjects

Z Wang, A Li, L Zhu, Y Guo, Q Dou, Z Li - arxiv preprint arxiv:2401.09962, 2024 - arxiv.org
Customized text-to-video generation aims to generate high-quality videos guided by text
prompts and subject references. Current approaches for personalizing text-to-video …

Customcrafter: Customized video generation with preserving motion and concept composition abilities

T Wu, Y Zhang, X Wang, X Zhou, G Zheng, Z Qi… - arxiv preprint arxiv …, 2024 - arxiv.org
Customized video generation aims to generate high-quality videos guided by text prompts
and subject's reference images. However, since it is only trained on static images, the fine …

Lumina-t2x: Transforming text into any modality, resolution, and duration via flow-based large diffusion transformers

P Gao, L Zhuo, D Liu, R Du, X Luo, L Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic
images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks …

Attndreambooth: Towards text-aligned personalized text-to-image generation

L Pang, J Yin, B Zhao, F Wu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Recent advances in text-to-image models have enabled high-quality personalized image
synthesis based on user-provided concepts with flexible textual control. In this work, we …

Disenstudio: Customized multi-subject text-to-video generation with disentangled spatial control

H Chen, X Wang, Y Zhang, Y Zhou, Z Zhang… - Proceedings of the …, 2024 - dl.acm.org
Generating customized content in videos has received increasing attention recently.
However, existing works primarily focus on customized text-to-video generation for single …

A survey on personalized content synthesis with diffusion models

X Zhang, XY Wei, W Zhang, J Wu, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in generative models have significantly impacted content creation,
leading to the emergence of Personalized Content Synthesis (PCS). With a small set of user …