A survey of multimodal-guided image editing with text-to-image diffusion models

X Shuai, H Ding, X Ma, R Tu, YG Jiang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Image editing aims to edit the given synthetic or real image to meet the specific requirements
from users. It is widely studied in recent years as a promising and challenging field of …

Training-free consistent text-to-image generation

Y Tewel, O Kaduri, R Gal, Y Kasten, L Wolf… - ACM Transactions on …, 2024‏ - dl.acm.org
Text-to-image models offer a new level of creative flexibility by allowing users to guide the
image generation process through natural language. However, using these models to …

Be yourself: Bounded attention for multi-subject text-to-image generation

O Dahary, O Patashnik, K Aberman… - European Conference on …, 2024‏ - Springer
Text-to-image diffusion models have an unprecedented ability to generate diverse and high-
quality images. However, they often struggle to faithfully capture the intended semantics of …

Ctrl-x: Controlling structure and appearance for text-to-image generation without guidance

KH Lin, S Mo, B Klingher, F Mu… - Advances in Neural …, 2025‏ - proceedings.neurips.cc
Recent controllable generation approaches such as FreeControl and Diffusion Self-
Guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion …

Generative rendering: Controllable 4d-guided video generation with 2d diffusion models

S Cai, D Ceylan, M Gadelha… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Traditional 3D content creation tools empower users to bring their imagination to life by
giving them direct control over a scene's geometry appearance motion and camera path …

Relightful harmonization: Lighting-aware portrait background replacement

M Ren, W **ong, JS Yoon, Z Shu… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Portrait harmonization aims to composite a subject into a new background adjusting its
lighting and color to ensure harmony with the background scene. Existing harmonization …

Zero-to-hero: Enhancing zero-shot novel view synthesis via attention map filtering

I Sobol, C Xu, O Litany - Advances in Neural Information …, 2025‏ - proceedings.neurips.cc
Generating realistic images from arbitrary views based on a single source image remains a
significant challenge in computer vision, with broad applications ranging from e-commerce …

Generic 3d diffusion adapter using controlled multi-view editing

H Chen, R Shi, Y Liu, B Shen, J Gu, G Wetzstein… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Open-domain 3D object synthesis has been lagging behind image synthesis due to limited
data and higher computational complexity. To bridge this gap, recent works have …

Fontstudio: shape-adaptive diffusion model for coherent and consistent font effect generation

X Mu, L Chen, B Chen, S Gu, J Bao, D Chen… - … on Computer Vision, 2024‏ - Springer
Recently, the application of modern diffusion-based text-to-image generation models for
creating artistic fonts, traditionally the domain of professional designers, has garnered …

Monkey see, monkey do: Harnessing self-attention in motion diffusion for zero-shot motion transfer

S Raab, I Gat, N Sala, G Tevet… - SIGGRAPH Asia 2024 …, 2024‏ - dl.acm.org
Given the remarkable results of motion synthesis with diffusion models, a natural question
arises: how can we effectively leverage these models for motion editing? Existing diffusion …