Monkey see, monkey do: Harnessing self-attention in motion diffusion for zero-shot motion transfer

S Raab, I Gat, N Sala, G Tevet… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Given the remarkable results of motion synthesis with diffusion models, a natural question
arises: how can we effectively leverage these models for motion editing? Existing diffusion …

UniPortrait: A Unified Framework for Identity-Preserving Single-and Multi-Human Image Personalization

J He, Y Geng, L Bo - arxiv preprint arxiv:2408.05939, 2024 - arxiv.org
This paper presents UniPortrait, an innovative human image personalization framework that
unifies single-and multi-ID customization with high face fidelity, extensive facial editability …

Object-level Visual Prompts for Compositional Image Generation

G Parmar, O Patashnik, KC Wang, D Ostashev… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce a method for composing object-level visual prompts within a text-to-image
diffusion model. Our approach addresses the task of generating semantically coherent …

Itercomp: Iterative composition-aware feedback learning from model gallery for text-to-image generation

X Zhang, L Yang, G Li, Y Cai, J **e, Y Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable
strides in compositional text-to-image generation. However, these methods typically exhibit …

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

THS Meral, H Yesiltepe, C Dunlop… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-video models have demonstrated impressive capabilities in producing diverse and
captivating video content, showcasing a notable advancement in generative AI. However …

CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation

H Zhang, D Hong, T Gao, Y Wang, J Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …

Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation

T Wei, D Chen, Y Zhou, X Pan - arxiv preprint arxiv:2411.18301, 2024 - arxiv.org
Representing the cutting-edge technique of text-to-image models, the latest Multimodal
Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous …

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

D Jiang, G Song, X Wu, R Zhang, D Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have demonstrated great success in the field of text-to-image generation.
However, alleviating the misalignment between the text prompts and images is still …

AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

J He, Y Tuo, B Chen, C Zhong, Y Geng, L Bo - arxiv preprint arxiv …, 2025 - arxiv.org
Recently, large-scale generative models have demonstrated outstanding text-to-image
generation capabilities. However, generating high-fidelity personalized images with specific …

Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds

S Li, H Le, J Xu, M Salzmann - arxiv preprint arxiv:2411.18810, 2024 - arxiv.org
Text-to-image diffusion models have demonstrated remarkable capability in generating
realistic images from arbitrary text prompts. However, they often produce inconsistent results …