Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L **e, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

Multi-concept customization of text-to-image diffusion

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …

ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers

Y Balaji, S Nah, X Huang, A Vahdat, J Song… - arxiv preprint arxiv …, 2022 - arxiv.org
Large-scale diffusion-based generative models have led to breakthroughs in text-
conditioned high-resolution image synthesis. Starting from random noise, such text-to-image …

Get3d: A generative model of high quality 3d textured shapes learned from images

J Gao, T Shen, Z Wang, W Chen… - Advances In …, 2022 - proceedings.neurips.cc
As several industries are moving towards modeling massive 3D virtual worlds, the need for
content creation tools that can scale in terms of the quantity, quality, and diversity of 3D …

Ablating concepts in text-to-image diffusion models

N Kumari, B Zhang, SY Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale text-to-image diffusion models can generate high-fidelity images with powerful
compositional ability. However, these models are typically trained on an enormous amount …

Omniobject3d: Large-vocabulary 3d object dataset for realistic perception, reconstruction and generation

T Wu, J Zhang, X Fu, Y Wang, J Ren… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent advances in modeling 3D objects mostly rely on synthetic datasets due to the lack of
large-scale real-scanned 3D databases. To facilitate the development of 3D perception …

Composer: Creative and controllable image synthesis with composable conditions

L Huang, D Chen, Y Liu, Y Shen, D Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent large-scale generative models learned on big data are capable of synthesizing
incredible images yet suffer from limited controllability. This work offers a new generation …

Dense text-to-image generation with attention modulation

Y Kim, J Lee, JH Kim, JW Ha… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Existing text-to-image diffusion models struggle to synthesize realistic images given dense
captions, where each text prompt provides a detailed description for a specific image region …

Sketch-guided text-to-image diffusion models

A Voynov, K Aberman, D Cohen-Or - ACM SIGGRAPH 2023 Conference …, 2023 - dl.acm.org
Text-to-Image models have introduced a remarkable leap in the evolution of machine
learning, demonstrating high-quality synthesis of images from a given text-prompt. However …