Language-conditioned detection transformer

JH Cho, P Krähenbühl - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …

OLAF: A Plug-and-Play Framework for Enhanced Multi-object Multi-part Scene Parsing

P Gupta, R Singh, P Shenoy… - European Conference on …, 2024 - Springer
Multi-object multi-part scene segmentation is a challenging task whose complexity scales
exponentially with part granularity and number of scene objects. To address the task, we …

CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models

KA Nguyen, A Juvekar, T Yu, M Wahed… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in Large Vision-Language Models (LVLMs) have sparked significant
progress in general-purpose vision tasks through visual instruction tuning. While some …