Multimodal image synthesis and editing: A survey and taxonomy
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …
among multimodal information plays a key role for the creation and perception of multimodal …
State of the art on diffusion models for visual computing
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
Dragdiffusion: Harnessing diffusion models for interactive point-based image editing
Accurate and controllable image editing is a challenging task that has attracted significant
attention recently. Notably DragGAN developed by Pan et al.(2023) is an interactive point …
attention recently. Notably DragGAN developed by Pan et al.(2023) is an interactive point …
Dragondiffusion: Enabling drag-style manipulation on diffusion models
Despite the ability of existing large-scale text-to-image (T2I) models to generate high-quality
images from detailed textual descriptions, they often lack the ability to precisely edit the …
images from detailed textual descriptions, they often lack the ability to precisely edit the …
Interngpt: Solving vision-centric tasks by interacting with chatgpt beyond language
We present an interactive visual framework named InternGPT, or iGPT for short. The
framework integrates chatbots that have planning and reasoning capabilities, such as …
framework integrates chatbots that have planning and reasoning capabilities, such as …
Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …
image-to-video generation (I2V). In contrast to previous methods that directly learn the …
A systematic survey of prompt engineering on vision-language foundation models
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …
Draganything: Motion control for anything using entity representation
We introduce DragAnything, which utilizes a entity representation to achieve motion control
for any object in controllable video generation. Comparison to existing motion control …
for any object in controllable video generation. Comparison to existing motion control …
Diffusion model-based image editing: A survey
Denoising diffusion models have emerged as a powerful tool for various image generation
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …
and editing tasks, facilitating the synthesis of visual content in an unconditional or input …
Promptmagician: Interactive prompt engineering for text-to-image creation
Generative text-to-image models have gained great popularity among the public for their
powerful capability to generate high-quality images based on natural language prompts …
powerful capability to generate high-quality images based on natural language prompts …