Clap4clip: Continual learning with probabilistic finetuning for vision-language models

S Jha, D Gong, L Yao - arxiv preprint arxiv:2403.19137, 2024 - arxiv.org
Continual learning (CL) aims to help deep neural networks learn new knowledge while
retaining what has been learned. Owing to their powerful generalizability, pre-trained vision …

Awt: Transferring vision-language models via augmentation, weighting, and transportation

Y Zhu, Y Ji, Z Zhao, G Wu, L Wang - arxiv preprint arxiv:2407.04603, 2024 - arxiv.org
Pre-trained vision-language models (VLMs) have shown impressive results in various visual
classification tasks. However, we often fail to fully unleash their potential when adapting …

Baple: Backdoor attacks on medical foundational models using prompt learning

A Hanif, F Shamshad, M Awais, M Naseer… - … Conference on Medical …, 2024 - Springer
Medical foundation models are gaining prominence in the medical community for their ability
to derive general representations from extensive collections of medical image-text pairs …

Unimed-clip: Towards a unified image-text pretraining paradigm for diverse medical imaging modalities

MU Khattak, S Kunhimon, M Naseer, S Khan… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable
success in natural image tasks. However, their application in the medical domain remains …

[PDF][PDF] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

Y Guo, S Zhuang, K Li, Y Qiao… - The Thirty-eighth …, 2024 - proceedings.neurips.cc
Vision-language foundation models (such as CLIP) have recently shown their power in
transfer learning, owing to large-scale image-text pre-training. However, target domain data …

IPO: Interpretable Prompt Optimization for Vision-Language Models

Y Du, W Sun, CGM Snoek - arxiv preprint arxiv:2410.15397, 2024 - arxiv.org
Pre-trained vision-language models like CLIP have remarkably adapted to various
downstream tasks. Nonetheless, their performance heavily depends on the specificity of the …

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

T Koleilat, H Asgariandehkordi, H Rivaz… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated
substantial success in self-supervised representation learning for vision tasks. However …

CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections

MF Imam, RF Marew, J Hassan, M Fiaz, AF Aji… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of foundation models, CLIP has emerged as a powerful tool for aligning text and
visual modalities into a common embedding space. However, the alignment objective used …

How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?

S Wang, C Liu, R Arcucci - arxiv preprint arxiv:2409.00543, 2024 - arxiv.org
Recent advancements in medical vision-language pre-training (MedVLP) have significantly
enhanced zero-shot medical vision tasks such as image classification by leveraging large …

XDT-CXR: Investigating Cross-Disease Transferability in Zero-Shot Binary Classification of Chest X-Rays

U Rahman, A Basu, MU Khattak… - arxiv preprint arxiv …, 2024 - arxiv.org
This study explores the concept of cross-disease transferability (XDT) in medical imaging,
focusing on the potential of binary classifiers trained on one disease to perform zero-shot …