Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
Adding conditional control to text-to-image diffusion models
We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …
Emergent correspondence from image diffusion
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …
this paper, we show that correspondence emerges in image diffusion models without any …
Dreambooth3d: Subject-driven text-to-3d generation
We present DreamBooth3D, an approach to personalize text-to-3D generative models from
as few as 3-6 casually captured images of a subject. Our approach combines recent …
as few as 3-6 casually captured images of a subject. Our approach combines recent …
Syncdreamer: Generating multiview-consistent images from a single-view image
In this paper, we present a novel diffusion model called that generates multiview-consistent
images from a single-view image. Using pretrained large-scale 2D diffusion models, recent …
images from a single-view image. Using pretrained large-scale 2D diffusion models, recent …
Lumiere: A space-time diffusion model for video generation
We introduce Lumiere–a text-to-video diffusion model designed for synthesizing videos that
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
portray realistic, diverse and coherent motion–a pivotal challenge in video synthesis. To this …
Tokenflow: Consistent diffusion features for consistent video editing
The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-
the-art video models are still lagging behind image models in terms of visual quality and …
the-art video models are still lagging behind image models in terms of visual quality and …
Dense text-to-image generation with attention modulation
Existing text-to-image diffusion models struggle to synthesize realistic images given dense
captions, where each text prompt provides a detailed description for a specific image region …
captions, where each text prompt provides a detailed description for a specific image region …
Adversarial attacks and defenses on text-to-image diffusion models: A survey
C Zhang, M Hu, W Li, L Wang - Information Fusion, 2024 - Elsevier
Recently, the text-to-image diffusion model has gained considerable attention from the
community due to its exceptional image generation capability. A representative model …
community due to its exceptional image generation capability. A representative model …
Localizing object-level shape variations with text-to-image diffusion models
Text-to-image models give rise to workflows which often begin with an exploration step,
where users sift through a large collection of generated images. The global nature of the text …
where users sift through a large collection of generated images. The global nature of the text …