Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation

Q Guo, T Lin - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion-based methods like InstructPix2Pix (IP2P) have achieved effective
instruction-based image editing requiring only natural language instructions from the user …

Stablenormal: Reducing diffusion variance for stable and sharp normal

C Ye, L Qiu, X Gu, Q Zuo, Y Wu, Z Dong, L Bo… - ACM Transactions on …, 2024 - dl.acm.org
This work addresses the challenge of high-quality surface normal estimation from monocular
colored inputs (ie, images and videos), a field which has recently been revolutionized by …

Exploiting Diffusion Prior for Generalizable Dense Prediction

HY Lee, HY Tseng, MH Yang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Contents generated by recent advanced Text-to-Image (T2I) diffusion models are sometimes
too imaginative for existing off-the-shelf dense predictors to estimate due to the immitigable …

Dragapart: Learning a part-level motion prior for articulated objects

R Li, C Zheng, C Rupprecht, A Vedaldi - European Conference on …, 2024 - Springer
We introduce DragAPart, a method that, given an image and a set of drags as input,
generates a new image of the same object that responds to the action of the drags …

Unigs: Unified representation for image generation and segmentation

L Qi, L Yang, W Guo, Y Xu, B Du… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper introduces a novel unified representation of diffusion models for image
generation and segmentation. Specifically we use a colormap to represent entity-level …

Telling left from right: Identifying geometry-aware semantic correspondence

J Zhang, C Herrmann, J Hur, E Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
While pre-trained large-scale vision models have shown significant promise for semantic
correspondence their features often struggle to grasp the geometry and orientation of …

Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval

D Samuel, R Ben-Ari, M Levy… - Advances in Neural …, 2025 - proceedings.neurips.cc
Personalized retrieval and segmentation aim to locate specific instances within a dataset
based on an input image and a short description of the reference instance. While supervised …

MGSGNet-S*: Multilayer guided Semantic graph network via knowledge distillation for RGB-thermal urban scene parsing

W Zhou, H Wu, Q Jiang - IEEE Transactions on Intelligent …, 2024 - ieeexplore.ieee.org
Owing to rapid developments in driverless technologies, vision tasks for unmanned vehicles
have gained considerable attention, particularly in multimodal-based urban scene parsing …

Slime: Segment like me

A Khani, SA Taghanaki, A Sanghi, AM Amiri… - arxiv preprint arxiv …, 2023 - arxiv.org
Significant strides have been made using large vision-language models, like Stable
Diffusion (SD), for a variety of downstream tasks, including image editing, image …

Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models

P Marcos-Manchón, R Alcover-Couso… - Proceedings of the …, 2024 - openaccess.thecvf.com
Diffusion models represent a new paradigm in text-to-image generation. Beyond generating
high-quality images from text prompts models such as Stable Diffusion have been …