Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Dragdiffusion: Harnessing diffusion models for interactive point-based image editing

Y Shi, C Xue, JH Liew, J Pan, H Yan… - Proceedings of the …, 2024 - openaccess.thecvf.com
Accurate and controllable image editing is a challenging task that has attracted significant
attention recently. Notably DragGAN developed by Pan et al.(2023) is an interactive point …

Dragondiffusion: Enabling drag-style manipulation on diffusion models

C Mou, X Wang, J Song, Y Shan, J Zhang - arxiv preprint arxiv …, 2023 - arxiv.org
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …

Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language

Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
We present an interactive visual framework named InternGPT, or iGPT for short. The
framework integrates chatbots that have planning and reasoning capabilities, such as …

Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling

X Shi, Z Huang, FY Wang, W Bian, D Li… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

Draganything: Motion control for anything using entity representation

W Wu, Z Li, Y Gu, R Zhao, Y He, DJ Zhang… - … on Computer Vision, 2024 - Springer
We introduce DragAnything, which utilizes a entity representation to achieve motion control
for any object in controllable video generation. Comparison to existing motion control …

Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

Promptmagician: Interactive prompt engineering for text-to-image creation

Y Feng, X Wang, KK Wong, S Wang… - … on Visualization and …, 2023 - ieeexplore.ieee.org
Generative text-to-image models have gained great popularity among the public for their
powerful capability to generate high-quality images based on natural language prompts …