Clip in medical imaging: A comprehensive survey
Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …
paradigm, successfully introduces text supervision to vision models. It has shown promising …
Visual tuning
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …
downstream visual tasks. With the surprising development of pre-trained visual foundation …
A systematic survey of prompt engineering on vision-language foundation models
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …
A pilot study of query-free adversarial attack against stable diffusion
Despite the record-breaking performance in Text-to-Image (T2I) generation by Stable
Diffusion, less research attention is paid to its adversarial robustness. In this work, we study …
Diffusion, less research attention is paid to its adversarial robustness. In this work, we study …
Few-shot adversarial prompt learning on vision-language models
The vulnerability of deep neural networks to imperceptible adversarial perturbations has
attracted widespread attention. Inspired by the success of vision-language foundation …
attracted widespread attention. Inspired by the success of vision-language foundation …
Pre-trained model guided fine-tuning for zero-shot adversarial robustness
Large-scale pre-trained vision-language models like CLIP have demonstrated impressive
performance across various tasks and exhibit remarkable zero-shot generalization capability …
performance across various tasks and exhibit remarkable zero-shot generalization capability …
One prompt word is enough to boost adversarial robustness for pre-trained vision-language models
Abstract Large pre-trained Vision-Language Models (VLMs) like CLIP despite having
remarkable generalization ability are highly vulnerable to adversarial examples. This work …
remarkable generalization ability are highly vulnerable to adversarial examples. This work …
Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers
Given the power of vision transformers a new learning paradigm pre-training and then
prompting makes it more efficient and effective to address downstream visual recognition …
prompting makes it more efficient and effective to address downstream visual recognition …
[HTML][HTML] A comprehensive survey of robust deep learning in computer vision
Deep learning has presented remarkable progress in various tasks. Despite the excellent
performance, deep learning models remain not robust, especially to well-designed …
performance, deep learning models remain not robust, especially to well-designed …
Convolutional visual prompt for robust visual perception
Vision models are often vulnerable to out-of-distribution (OOD) samples without adapting.
While visual prompts offer a lightweight method of input-space adaptation for large-scale …
While visual prompts offer a lightweight method of input-space adaptation for large-scale …