Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation
Q Guo, T Lin - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion-based methods like InstructPix2Pix (IP2P) have achieved effective
instruction-based image editing requiring only natural language instructions from the user …
instruction-based image editing requiring only natural language instructions from the user …
Stablenormal: Reducing diffusion variance for stable and sharp normal
This work addresses the challenge of high-quality surface normal estimation from monocular
colored inputs (ie, images and videos), a field which has recently been revolutionized by …
colored inputs (ie, images and videos), a field which has recently been revolutionized by …
Exploiting Diffusion Prior for Generalizable Dense Prediction
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …
Dragapart: Learning a part-level motion prior for articulated objects
We introduce DragAPart, a method that, given an image and a set of drags as input,
generates a new image of the same object that responds to the action of the drags …
generates a new image of the same object that responds to the action of the drags …
Unigs: Unified representation for image generation and segmentation
This paper introduces a novel unified representation of diffusion models for image
generation and segmentation. Specifically we use a colormap to represent entity-level …
generation and segmentation. Specifically we use a colormap to represent entity-level …
Telling left from right: Identifying geometry-aware semantic correspondence
While pre-trained large-scale vision models have shown significant promise for semantic
correspondence their features often struggle to grasp the geometry and orientation of …
correspondence their features often struggle to grasp the geometry and orientation of …
Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval
Personalized retrieval and segmentation aim to locate specific instances within a dataset
based on an input image and a short description of the reference instance. While supervised …
based on an input image and a short description of the reference instance. While supervised …
MGSGNet-S*: Multilayer guided Semantic graph network via knowledge distillation for RGB-thermal urban scene parsing
Owing to rapid developments in driverless technologies, vision tasks for unmanned vehicles
have gained considerable attention, particularly in multimodal-based urban scene parsing …
have gained considerable attention, particularly in multimodal-based urban scene parsing …
Slime: Segment like me
Significant strides have been made using large vision-language models, like Stable
Diffusion (SD), for a variety of downstream tasks, including image editing, image …
Diffusion (SD), for a variety of downstream tasks, including image editing, image …
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating
high-quality images from text prompts models such as Stable Diffusion have been …
high-quality images from text prompts models such as Stable Diffusion have been …