Evolving interpretable visual classifiers with large language models

M Chiquier, U Mall, C Vondrick - European Conference on Computer …, 2024 - Springer
Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to
their open-vocabulary flexibility and high performance. However, vision-language models …

Copt: Unsupervised domain adaptive segmentation using domain-agnostic text embeddings

C Mata, K Ranasinghe, MS Ryoo - European Conference on Computer …, 2024 - Springer
Unsupervised domain adaptation (UDA) involves learning class semantics from labeled
data within a source domain that generalize to an unseen target domain. UDA methods are …

From Words to Poses: Enhancing Novel Object Pose Estimation with Vision Language Models

T Pulli, S Thalhammer, S Schwaiger… - arxiv preprint arxiv …, 2024 - arxiv.org
Robots are increasingly envisioned to interact in real-world scenarios, where they must
continuously adapt to new situations. To detect and grasp novel objects, zero-shot pose …

Open-Vocabulary Object Detection via Neighboring Region Attention Alignment

S Qiang, X Li, Y Liang, W Liao, T He, P Peng - arxiv preprint arxiv …, 2024 - arxiv.org
The nature of diversity in real-world environments necessitates neural network models to
expand from closed category settings to accommodate novel emerging categories. In this …

EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection

Q Lei, B Wang, RT Tan - arxiv preprint arxiv:2410.23904, 2024 - arxiv.org
Detecting Human-Object Interactions (HOI) in zero-shot settings, where models must handle
unseen classes, poses significant challenges. Existing methods that rely on aligning visual …

SSPA: Split-and-Synthesize Prompting with Gated Alignments for Multi-Label Image Recognition

H Tan, Z Tan, J Li, J Wan, Z Lei, SZ Li - arxiv preprint arxiv:2407.20920, 2024 - arxiv.org
Multi-label image recognition is a fundamental task in computer vision. Recently, Vision-
Language Models (VLMs) have made notable advancements in this area. However …

Sampling Bag of Views for Open-Vocabulary Object Detection

H Choi, J Choe, H Shim - arxiv preprint arxiv:2412.18273, 2024 - arxiv.org
Existing open-vocabulary object detection (OVD) develops methods for testing unseen
categories by aligning object region embeddings with corresponding VLM features. A recent …

VisGraphVar: A Benchmark Generator for Assessing Variability in Graph Analysis Using Large Vision-Language Models

CC Sartori, C Blum, F Bistaffa - IEEE Access, 2025 - ieeexplore.ieee.org
The fast advancement of Large Vision-Language Models (LVLMs) has shown immense
potential. These models are increasingly capable of tackling abstract visual tasks. Geometric …

Test-Time Optimization for Domain Adaptive Open Vocabulary Segmentation

U De Silva, D Samaraweera, S Wanigathunga… - arxiv preprint arxiv …, 2025 - arxiv.org
We present Seg-TTO, a novel framework for zero-shot, open-vocabulary semantic
segmentation (OVSS), designed to excel in specialized domain tasks. While current open …

Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation

Y Zheng, K Liu - arxiv preprint arxiv:2404.08603, 2024 - arxiv.org
Open-vocabulary object detection (OVOD) aims at localizing and recognizing visual objects
from novel classes unseen at the training time. Whereas, empirical studies reveal that …