Animate-x: Universal character image animation with enhanced motion representation

S Tan, B Gong, X Wang, S Zhang, D Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Character image animation, which generates high-quality videos from a reference image
and target pose sequence, has seen significant progress in recent years. However, most …

Record: Reasoning and correcting diffusion for hoi generation

JY Jiang-Lin, KY Huang, L Lo, YN Huang… - Proceedings of the …, 2024 - dl.acm.org
Diffusion models revolutionize image generation by leveraging natural language to guide
the creation of multimedia content. Despite significant advancements in such generative …

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

T Hu, L Li, J van de Weijer, H Gao, FS Khan… - arxiv preprint arxiv …, 2024 - arxiv.org
Although text-to-image (T2I) models exhibit remarkable generation capabilities, they
frequently fail to accurately bind semantically related objects or attributes in the input …

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

S Shi, B Gong, X Chen, D Zheng, S Tan, Z Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
The image-to-video (I2V) generation is conditioned on the static image, which has been
enhanced recently by the motion intensity as an additional control signal. These motion …

Mimir: Improving Video Diffusion Models for Precise Text Understanding

S Tan, B Gong, Y Feng, K Zheng, D Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Text serves as the key control signal in video generation due to its narrative nature. To
render text descriptions into video clips, current video diffusion models borrow features from …

VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis

Z Chen, L Yang, Y Qi, H Zhang, K Pang, K Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the rapid advancements in text-to-image (T2I) synthesis, enabling precise visual
control remains a significant challenge. Existing works attempted to incorporate multi-facet …

[HTML][HTML] Improving faithfulness of text-to-image diffusion models through inference intervention

D Guo, S Agarwal, YH Lin, JY Kao, T Chung, N Peng… - 2025 - amazon.science
Text-to-Image diffusion models have shown remarkable capabilities in generating high-
quality images. However, current models often struggle to adhere to the complete set of …