[HTML][HTML] RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision
Zero-shot remote sensing scene classification aims to solve the scene classification problem
on unseen categories and has attracted numerous research attention in the remote sensing …
on unseen categories and has attracted numerous research attention in the remote sensing …
Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
Maple: Multi-modal prompt learning
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …
representations from limited training samples. Recently, CLIP-based methods have shown …
Test-time prompt tuning for zero-shot generalization in vision-language models
Pre-trained vision-language models (eg, CLIP) have shown promising zero-shot
generalization in many downstream tasks with properly designed text prompts. Instead of …
generalization in many downstream tasks with properly designed text prompts. Instead of …
Self-regulating prompts: Foundational model adaptation without forgetting
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …
What does a platypus look like? generating customized prompts for zero-shot image classification
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …
traditional classification models, open-vocabulary models classify among any arbitrary set of …
Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models
The ability to quickly learn a new task with minimal instruction-known as few-shot learning-is
a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot …
a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot …
Generalized out-of-distribution detection and beyond in vision language model era: A survey
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine
learning systems and has shaped the field of OOD detection. Meanwhile, several other …
learning systems and has shaped the field of OOD detection. Meanwhile, several other …
S-prompts learning with pre-trained transformers: An occam's razor for domain incremental learning
State-of-the-art deep neural networks are still struggling to address the catastrophic
forgetting problem in continual learning. In this paper, we propose one simple paradigm …
forgetting problem in continual learning. In this paper, we propose one simple paradigm …