Unified 3d segmenter as prototypical classifiers
The task of point cloud segmentation, comprising semantic, instance, and panoptic
segmentation, has been mainly tackled by designing task-specific network architectures …
segmentation, has been mainly tackled by designing task-specific network architectures …
[HTML][HTML] Soft prompt tuning for augmenting dense retrieval with large language models
Dense retrieval (DR) converts queries and documents into dense embeddings and
measures the similarity between queries and documents in vector space. One of the major …
measures the similarity between queries and documents in vector space. One of the major …
Amd: Automatic multi-step distillation of large-scale vision models
Transformer-based architectures have become the de-facto standard models for diverse
vision tasks owing to their superior performance. As the size of these transformer-based …
vision tasks owing to their superior performance. As the size of these transformer-based …
Promptkd: Unsupervised prompt distillation for vision-language models
Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …
MPT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across
a wide range of domains, with increasing emphasis on enhancing their zero-shot …
a wide range of domains, with increasing emphasis on enhancing their zero-shot …
Efficient multimodal semantic segmentation via dual-prompt learning
Multimodal (eg, RGB-Depth/RGB-Thermal) fusion has shown great potential for improving
semantic segmentation in complex scenes (eg, indoor/low-light conditions). Existing …
semantic segmentation in complex scenes (eg, indoor/low-light conditions). Existing …
Visual Fourier Prompt Tuning
With the scale of vision Transformer-based models continuing to grow, finetuning these
large-scale pretrained models for new tasks has become increasingly parameter-intensive …
large-scale pretrained models for new tasks has become increasingly parameter-intensive …
Be Confident in What You Know: Bayesian Parameter Efficient Fine-Tuning of Vision Foundation Models
Large transformer-based foundation models have been commonly used as pre-trained
models that can be adapted to different challenging datasets and settings with state-of-the …
models that can be adapted to different challenging datasets and settings with state-of-the …
Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models
With the continuous growth of large language models, the process of fine-tuning these
models for new tasks has become increasingly parameter-intensive. Prompt tuning, a …
models for new tasks has become increasingly parameter-intensive. Prompt tuning, a …
Image translation as diffusion visual programmers
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image
translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion …
translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion …