Evolving interpretable visual classifiers with large language models
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to
their open-vocabulary flexibility and high performance. However, vision-language models …
their open-vocabulary flexibility and high performance. However, vision-language models …
Copt: Unsupervised domain adaptive segmentation using domain-agnostic text embeddings
Unsupervised domain adaptation (UDA) involves learning class semantics from labeled
data within a source domain that generalize to an unseen target domain. UDA methods are …
data within a source domain that generalize to an unseen target domain. UDA methods are …
From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models
Robots are increasingly envisioned to interact in real-world scenarios, where they must
continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose …
continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose …
Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
The nature of diversity in real-world environments necessitates neural network models to
expand from closed category settings to accommodate novel emerging categories. In this …
expand from closed category settings to accommodate novel emerging categories. In this …
EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection
Detecting Human-Object Interactions (HOI) in zero-shot settings, where models must handle
unseen classes, poses significant challenges. Existing methods that rely on aligning visual …
unseen classes, poses significant challenges. Existing methods that rely on aligning visual …
SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition
Multi-label image recognition is a fundamental task in computer vision. Recently, Vision-
Language Models (VLMs) have made notable advancements in this area. However …
Language Models (VLMs) have made notable advancements in this area. However …
Sampling Bag of Views for Open-Vocabulary Object Detection
Existing open-vocabulary object detection (OVD) develops methods for testing unseen
categories by aligning object region embeddings with corresponding VLM features. A recent …
categories by aligning object region embeddings with corresponding VLM features. A recent …
VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models
The fast advancement of Large Vision-Language Models (LVLMs) has shown immense
potential. These models are increasingly capable of tackling abstract visual tasks. Geometric …
potential. These models are increasingly capable of tackling abstract visual tasks. Geometric …
Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation
We present Seg-TTO, a novel framework for zero-shot, open-vocabulary semantic
segmentation (OVSS), designed to excel in specialized domain tasks. While current open …
segmentation (OVSS), designed to excel in specialized domain tasks. While current open …
Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation
Y Zheng, K Liu - arxiv preprint arxiv:2404.08603, 2024 - arxiv.org
Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects
from novel classes unseen at the training time. Whereas, empirical studies reveal that …
from novel classes unseen at the training time. Whereas, empirical studies reveal that …