Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Dreambooth3d: Subject-driven text-to-3d generation

A Raj, S Kaza, B Poole, M Niemeyer… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present DreamBooth3D, an approach to personalize text-to-3D generative models from
as few as 3-6 casually captured images of a subject. Our approach combines recent …

Syncdreamer: Generating multiview-consistent images from a single-view image

Y Liu, C Lin, Z Zeng, X Long, L Liu, T Komura… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present a novel diffusion model called that generates multiview-consistent
images from a single-view image. Using pretrained large-scale 2D diffusion models, recent …

Lumiere: A space-time diffusion model for video generation

O Bar-Tal, H Chefer, O Tov, C Herrmann… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …

Tokenflow: Consistent diffusion features for consistent video editing

M Geyer, O Bar-Tal, S Bagon, T Dekel - arxiv preprint arxiv:2307.10373, 2023 - arxiv.org
The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-
the-art video models are still lagging behind image models in terms of visual quality and …

Dense text-to-image generation with attention modulation

Y Kim, J Lee, JH Kim, JW Ha… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Existing text-to-image diffusion models struggle to synthesize realistic images given dense
captions, where each text prompt provides a detailed description for a specific image region …

Adversarial attacks and defenses on text-to-image diffusion models: A survey

C Zhang, M Hu, W Li, L Wang - Information Fusion, 2024 - Elsevier
Recently, the text-to-image diffusion model has gained considerable attention from the
community due to its exceptional image generation capability. A representative model …

Localizing object-level shape variations with text-to-image diffusion models

O Patashnik, D Garibi, I Azuri… - Proceedings of the …, 2023 - openaccess.thecvf.com
Text-to-image models give rise to workflows which often begin with an exploration step,
where users sift through a large collection of generated images. The global nature of the text …