Clip in medical imaging: A comprehensive survey

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

A pilot study of query-free adversarial attack against stable diffusion

H Zhuang, Y Zhang, S Liu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable
Diffusion, less research attention is paid to its adversarial robustness. In this work, we study …

Few-shot adversarial prompt learning on vision-language models

Y Zhou, X **a, Z Lin, B Han… - Advances in Neural …, 2025 - proceedings.neurips.cc
The vulnerability of deep neural networks to imperceptible adversarial perturbations has
attracted widespread attention. Inspired by the success of vision-language foundation …

Pre-trained model guided fine-tuning for zero-shot adversarial robustness

S Wang, J Zhang, Z Yuan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Large-scale pre-trained vision-language models like CLIP have demonstrated impressive
performance across various tasks and exhibit remarkable zero-shot generalization capability …

One prompt word is enough to boost adversarial robustness for pre-trained vision-language models

L Li, H Guan, J Qiu, M Spratling - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Large pre-trained Vision-Language Models (VLMs) like CLIP despite having
remarkable generalization ability are highly vulnerable to adversarial examples. This work …

Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers

S Yang, J Bai, K Gao, Y Yang, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Given the power of vision transformers a new learning paradigm pre-training and then
prompting makes it more efficient and effective to address downstream visual recognition …

[HTML][HTML] A comprehensive survey of robust deep learning in computer vision

J Liu, Y ** - Journal of Automation and Intelligence, 2023 - Elsevier
Deep learning has presented remarkable progress in various tasks. Despite the excellent
performance, deep learning models remain not robust, especially to well-designed …

Convolutional visual prompt for robust visual perception

YY Tsai, C Mao, J Yang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Vision models are often vulnerable to out-of-distribution (OOD) samples without adapting.
While visual prompts offer a lightweight method of input-space adaptation for large-scale …