Dense text-to-image generation with attention modulation

Y Kim, J Lee, JH Kim, JW Ha… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Existing text-to-image diffusion models struggle to synthesize realistic images given dense
captions, where each text prompt provides a detailed description for a specific image region …

Be yourself: Bounded attention for multi-subject text-to-image generation

O Dahary, O Patashnik, K Aberman… - European Conference on …, 2024 - Springer
Text-to-image diffusion models have an unprecedented ability to generate diverse and high-
quality images. However, they often struggle to faithfully capture the intended semantics of …

Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arxiv preprint arxiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Loco: Locally constrained training-free layout-to-image synthesis

P Zhao, H Li, R **, SK Zhou - arxiv preprint arxiv:2311.12342, 2023 - arxiv.org
Recent text-to-image diffusion models have reached an unprecedented level in generating
high-quality images. However, their exclusive reliance on textual prompts often falls short in …

Multi-modal generative ai: Multi-modal llm, diffusion and beyond

H Chen, X Wang, Y Zhou, B Huang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-modal generative AI has received increasing attention in both academia and industry.
Particularly, two dominant families of techniques are: i) The multi-modal large language …

Personalized residuals for concept-driven text-to-image generation

C Ham, M Fisher, J Hays, N Kolkin… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present personalized residuals and localized attention-guided sampling for efficient
concept-driven generation using text-to-image diffusion models. Our method first represents …

A survey of multimodal controllable diffusion models

R Jiang, GC Zheng, T Li, TR Yang, JD Wang… - Journal of Computer …, 2024 - Springer
Diffusion models have recently emerged as powerful generative models, producing high-
fidelity samples across domains. Despite this, they have two key challenges, including …

Layered rendering diffusion model for zero-shot guided image synthesis

Z Qi, G Huang, Z Huang, Q Guo, J Chen, J Han… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper introduces innovative solutions to enhance spatial controllability in diffusion
models reliant on text queries. We present two key innovations: Vision Guidance and the …

Object-level Visual Prompts for Compositional Image Generation

G Parmar, O Patashnik, KC Wang, D Ostashev… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce a method for composing object-level visual prompts within a text-to-image
diffusion model. Our approach addresses the task of generating semantically coherent …

Lomoe: Localized multi-object editing via multi-diffusion

G Chakrabarty, A Chandrasekar… - Proceedings of the …, 2024 - dl.acm.org
Recent developments in diffusion models have demonstrated an exceptional capacity to
generate high-quality, prompt-conditioned image edits. Nevertheless, previous approaches …