Vipergpt: Visual inference via python execution for reasoning

D Surís, S Menon, C Vondrick - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arxiv preprint arxiv:2307.16851, 2023 - arxiv.org
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

Waffling around for performance: Visual classification with random words and broad concepts

K Roth, JM Kim, A Koepke, O Vinyals… - Proceedings of the …, 2023 - openaccess.thecvf.com
The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …

Learning without forgetting for vision-language models

DW Zhou, Y Zhang, Y Wang, J Ning… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Class-Incremental Learning (CIL) or continual learning is a desired capability in the real
world, which requires a learning system to adapt to new tasks without forgetting former ones …

Cotdet: Affordance knowledge prompting for task driven object detection

J Tang, G Zheng, J Yu, S Yang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Task driven object detection aims to detect object instances suitable for affording a task in an
image. Its challenge lies in object categories available for the task being too diverse to be …

Prompt learning in computer vision: a survey

Y Lei, J Li, Z Li, Y Cao, H Shan - Frontiers of Information Technology & …, 2024 - Springer
Prompt learning has attracted broad attention in computer vision since the large pre-trained
vision-language models (VLMs) exploded. Based on the close relationship between vision …

Follow the rules: reasoning for video anomaly detection with large language models

Y Yang, K Lee, B Dariush, Y Cao, SY Lo - European Conference on …, 2024 - Springer
Abstract Video Anomaly Detection (VAD) is crucial for applications such as security
surveillance and autonomous driving. However, existing VAD methods provide little …

Bridge the Modality and Capability Gaps in Vision-Language Model Selection

C Yi, Y He, DC Zhan, HJ Ye - Advances in Neural …, 2025 - proceedings.neurips.cc
Abstract Vision Language Models (VLMs) excel in zero-shot image classification by pairing
images with textual category names. The expanding variety of Pre-Trained VLMs enhances …

Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification

C Yi, L Ren, DC Zhan, HJ Ye - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
CLIP showcases exceptional cross-modal matching capabilities due to its training on image-
text contrastive learning tasks. However without specific optimization for unimodal scenarios …

Convolutional Prompting meets Language Models for Continual Learning

A Roy, R Moulick, VK Verma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Continual Learning (CL) enables machine learning models to learn from continuously
shifting new training data in absence of data from old tasks. Recently pre-trained vision …