Vipergpt: Visual inference via python execution for reasoning
Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …
Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …
encompassing various applications and research areas such as robustness, security …
Waffling around for performance: Visual classification with random words and broad concepts
The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …
shown to benefit from additional semantic knowledge from large language models (LLMs) …
Learning without forgetting for vision-language models
Class-Incremental Learning (CIL) or continual learning is a desired capability in the real
world, which requires a learning system to adapt to new tasks without forgetting former ones …
world, which requires a learning system to adapt to new tasks without forgetting former ones …
Cotdet: Affordance knowledge prompting for task driven object detection
Task driven object detection aims to detect object instances suitable for affording a task in an
image. Its challenge lies in object categories available for the task being too diverse to be …
image. Its challenge lies in object categories available for the task being too diverse to be …
Prompt learning in computer vision: a survey
Prompt learning has attracted broad attention in computer vision since the large pre-trained
vision-language models (VLMs) exploded. Based on the close relationship between vision …
vision-language models (VLMs) exploded. Based on the close relationship between vision …
Follow the rules: reasoning for video anomaly detection with large language models
Abstract Video Anomaly Detection (VAD) is crucial for applications such as security
surveillance and autonomous driving. However, existing VAD methods provide little …
surveillance and autonomous driving. However, existing VAD methods provide little …
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
Abstract Vision Language Models (VLMs) excel in zero-shot image classification by pairing
images with textual category names. The expanding variety of Pre-Trained VLMs enhances …
images with textual category names. The expanding variety of Pre-Trained VLMs enhances …
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
CLIP showcases exceptional cross-modal matching capabilities due to its training on image-
text contrastive learning tasks. However without specific optimization for unimodal scenarios …
text contrastive learning tasks. However without specific optimization for unimodal scenarios …
Convolutional Prompting meets Language Models for Continual Learning
Continual Learning (CL) enables machine learning models to learn from continuously
shifting new training data in absence of data from old tasks. Recently pre-trained vision …
shifting new training data in absence of data from old tasks. Recently pre-trained vision …