Monkey see, monkey do: Harnessing self-attention in motion diffusion for zero-shot motion transfer
Given the remarkable results of motion synthesis with diffusion models, a natural question
arises: how can we effectively leverage these models for motion editing? Existing diffusion …
arises: how can we effectively leverage these models for motion editing? Existing diffusion …
UniPortrait: A Unified Framework for Identity-Preserving Single-and Multi-Human Image Personalization
This paper presents UniPortrait, an innovative human image personalization framework that
unifies single-and multi-ID customization with high face fidelity, extensive facial editability …
unifies single-and multi-ID customization with high face fidelity, extensive facial editability …
Object-level Visual Prompts for Compositional Image Generation
We introduce a method for composing object-level visual prompts within a text-to-image
diffusion model. Our approach addresses the task of generating semantically coherent …
diffusion model. Our approach addresses the task of generating semantically coherent …
Itercomp: Iterative composition-aware feedback learning from model gallery for text-to-image generation
Advanced diffusion models like RPG, Stable Diffusion 3 and FLUX have made notable
strides in compositional text-to-image generation. However, these methods typically exhibit …
strides in compositional text-to-image generation. However, these methods typically exhibit …
MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models
Text-to-video models have demonstrated impressive capabilities in producing diverse and
captivating video content, showcasing a notable advancement in generative AI. However …
captivating video content, showcasing a notable advancement in generative AI. However …
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Diffusion models have been recognized for their ability to generate images that are not only
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …
visually appealing but also of high artistic quality. As a result, Layout-to-Image (L2I) …
Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Representing the cutting-edge technique of text-to-image models, the latest Multimodal
Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous …
Diffusion Transformer (MMDiT) largely mitigates many generation issues existing in previous …
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Diffusion models have demonstrated great success in the field of text-to-image generation.
However, alleviating the misalignment between the text prompts and images is still …
However, alleviating the misalignment between the text prompts and images is still …
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation
Recently, large-scale generative models have demonstrated outstanding text-to-image
generation capabilities. However, generating high-fidelity personalized images with specific …
generation capabilities. However, generating high-fidelity personalized images with specific …
Enhancing Compositional Text-to-Image Generation with Reliable Random Seeds
Text-to-image diffusion models have demonstrated remarkable capability in generating
realistic images from arbitrary text prompts. However, they often produce inconsistent results …
realistic images from arbitrary text prompts. However, they often produce inconsistent results …