Prompt-aligned gradient for prompt tuning

B Zhu, Y Niu, Y Han, Y Wu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a
zero-shot classifier by discrete prompt design, eg, the confidence score of an image …

Tcp: Textual-based class-aware prompt tuning for visual-language model

H Yao, R Zhang, C Xu - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Prompt tuning represents a valuable technique for adapting pre-trained visual-language
models (VLM) to various downstream tasks. Recent advancements in CoOp-based methods …

Balancing act: distribution-guided debiasing in diffusion models

R Parihar, A Bhat, A Basu, S Mallick… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Diffusion Models (DMs) have emerged as powerful generative models with
unprecedented image generation capability. These models are widely used for data …

Argue: Attribute-guided prompt tuning for vision-language models

X Tian, S Zou, Z Yang, J Zhang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Although soft prompt tuning is effective in efficiently adapting Vision-Language (V&L)
models for downstream tasks it shows limitations in dealing with distribution shifts. We …

Generalized logit adjustment: Calibrating fine-tuned models by removing label bias in foundation models

B Zhu, K Tang, Q Sun, H Zhang - Advances in Neural …, 2023 - proceedings.neurips.cc
Foundation models like CLIP allow zero-shot transfer on various tasks without additional
training data. Yet, the zero-shot performance is less competitive than a fully supervised one …

Improved visual fine-tuning with natural language supervision

J Wang, Y Xu, J Hu, M Yan, J Sang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Fine-tuning a visual pre-trained model can leverage the semantic information from large-
scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with …

Fine-Tuning for Few-Shot Image Classification by Multimodal Prototype Regularization

Q Wu, J Qi, D Zhang, H Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Large pre-trained vision-language models, such as CLIP [Radford et al. 2021], have
demonstrated remarkable performance in few-shot image classification. To facilitate the …

Robust Fine-tuning of Zero-shot Models via Variance Reduction

B Zhu, J Cui, H Zhang - Advances in Neural Information …, 2025 - proceedings.neurips.cc
When fine-tuning zero-shot models like CLIP, our desideratum is for the fine-tuned model to
excel in both in-distribution (ID) and out-of-distribution (OOD). Recently, ensemble-based …

Identifying implicit social biases in vision-language models

K Hamidieh, H Zhang, W Gerych, T Hartvigsen… - Proceedings of the …, 2024 - ojs.aaai.org
Vision-language models, like CLIP (Contrastive Language Image Pretraining), are
becoming increasingly popular for a wide range of multimodal retrieval tasks. However, prior …

Selective vision-language subspace projection for few-shot CLIP

X Zhu, B Zhu, Y Tan, S Wang, Y Hao… - Proceedings of the 32nd …, 2024 - dl.acm.org
Vision-language models such as CLIP are capable of map** the different modality data
into a unified feature space, enabling zero/few-shot inference by measuring the similarity of …