A comprehensive survey of few-shot learning: Evolution, applications, challenges, and opportunities
Few-shot learning (FSL) has emerged as an effective learning method and shows great
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …
potential. Despite the recent creative works in tackling FSL tasks, learning valid information …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Visual prompt tuning
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
Conditional prompt learning for vision-language models
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …
to investigate ways to adapt these models to downstream datasets. A recently proposed …
Visionllm: Large language model is also an open-ended decoder for vision-centric tasks
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …
Maple: Multi-modal prompt learning
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
Clip-adapter: Better vision-language models with feature adapters
Large-scale contrastive vision-language pretraining has shown significant progress in visual
representation learning. Unlike traditional visual systems trained by a fixed set of discrete …
representation learning. Unlike traditional visual systems trained by a fixed set of discrete …
Denseclip: Language-guided dense prediction with context-aware prompting
Recent progress has shown that large-scale pre-training using contrastive image-text pairs
can be a promising alternative for high-quality visual representation learning from natural …
can be a promising alternative for high-quality visual representation learning from natural …
Segment anything in high quality
Abstract The recent Segment Anything Model (SAM) represents a big leap in scaling up
segmentation models, allowing for powerful zero-shot capabilities and flexible prompting …
segmentation models, allowing for powerful zero-shot capabilities and flexible prompting …
Visual-language prompt tuning with knowledge-guided context optimization
Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to
the downstream task using task-related textual tokens. Representative CoOp-based works …
the downstream task using task-related textual tokens. Representative CoOp-based works …