- Academic Search

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Save Cite Cited by 267 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

State of the art on diffusion models for visual computing

R Po, W Yifan, V Golyanik, K Aberman… - Computer Graphics …, 2024 - Wiley Online Library

The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …

Save Cite Cited by 91 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Dragdiffusion: Harnessing diffusion models for interactive point-based image editing

Y Shi, C Xue, JH Liew, J Pan, H Yan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Accurate and controllable image editing is a challenging task that has attracted significant
attention recently. Notably DragGAN developed by Pan et al.(2023) is an interactive point …

Save Cite Cited by 158 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Dragondiffusion: Enabling drag-style manipulation on diffusion models

C Mou, X Wang, J Song, Y Shan, J Zhang - arxiv preprint arxiv …, 2023 - arxiv.org

Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …

Save Cite Cited by 124 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language

Z Liu, Y He, W Wang, W Wang, Y Wang, S Chen… - arxiv preprint arxiv …, 2023 - arxiv.org

We present an interactive visual framework named InternGPT, or iGPT for short. The
framework integrates chatbots that have planning and reasoning capabilities, such as …

Save Cite Cited by 90 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling

X Shi, Z Huang, FY Wang, W Bian, D Li… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …

Save Cite Cited by 58 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

Save Cite Cited by 139 Related articles All 3 versions Free GPT-4 View as HTML

Draganything: Motion control for anything using entity representation

W Wu, Z Li, Y Gu, R Zhao, Y He, DJ Zhang… - … on Computer Vision, 2024 - Springer

We introduce DragAnything, which utilizes a entity representation to achieve motion control
for any object in controllable video generation. Comparison to existing motion control …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Diffusion model-based image editing: A survey

Y Huang, J Huang, Y Liu, M Yan, J Lv, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …

Save Cite Cited by 68 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Promptmagician: Interactive prompt engineering for text-to-image creation

Y Feng, X Wang, KK Wong, S Wang… - … on Visualization and …, 2023 - ieeexplore.ieee.org

Generative text-to-image models have gained great popularity among the public for their
powerful capability to generate high-quality images based on natural language prompts …

Save Cite Cited by 81 Related articles All 10 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Drag your gan: Interactive point-based manipulation on the generative image manifold

Multimodal image synthesis and editing: A survey and taxonomy

State of the art on diffusion models for visual computing

Dragdiffusion: Harnessing diffusion models for interactive point-based image editing

Dragondiffusion: Enabling drag-style manipulation on diffusion models

Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language

Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling

A systematic survey of prompt engineering on vision-language foundation models

Draganything: Motion control for anything using entity representation

Diffusion model-based image editing: A survey

Promptmagician: Interactive prompt engineering for text-to-image creation