Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …
strong power of learning complex structures and meaningful semantics. However, relying …
Multi-concept customization of text-to-image diffusion
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …
scale database, a user often wishes to synthesize instantiations of their own concepts (for …
ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers
Large-scale diffusion-based generative models have led to breakthroughs in text-
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …
Get3d: A generative model of high quality 3d textured shapes learned from images
As several industries are moving towards modeling massive 3D virtual worlds, the need for
content creation tools that can scale in terms of the quantity, quality, and diversity of 3D …
content creation tools that can scale in terms of the quantity, quality, and diversity of 3D …
Ablating concepts in text-to-image diffusion models
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful
compositional ability. However, these models are typically trained on an enormous amount …
compositional ability. However, these models are typically trained on an enormous amount …
Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation
Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of
large-scale real-scanned 3D databases. To facilitate the development of 3D perception …
large-scale real-scanned 3D databases. To facilitate the development of 3D perception …
Composer: Creative and controllable image synthesis with composable conditions
Recent large-scale generative models learned on big data are capable of synthesizing
incredible images yet suffer from limited controllability. This work offers a new generation …
incredible images yet suffer from limited controllability. This work offers a new generation …
Dense text-to-image generation with attention modulation
Existing text-to-image diffusion models struggle to synthesize realistic images given dense
captions, where each text prompt provides a detailed description for a specific image region …
captions, where each text prompt provides a detailed description for a specific image region …
Sketch-guided text-to-image diffusion models
Text-to-Image models have introduced a remarkable leap in the evolution of machine
learning, demonstrating high-quality synthesis of images from a given text-prompt. However …
learning, demonstrating high-quality synthesis of images from a given text-prompt. However …