Unified 3d segmenter as prototypical classifiers

Z Qin, C Han, Q Wang, X Nie, Y Yin… - Advances in Neural …, 2023 - proceedings.neurips.cc
The task of point cloud segmentation, comprising semantic, instance, and panoptic
segmentation, has been mainly tackled by designing task-specific network architectures …

[HTML][HTML] Soft prompt tuning for augmenting dense retrieval with large language models

Z Peng, X Wu, Q Wang, Y Fang - Knowledge-Based Systems, 2025 - Elsevier
Dense retrieval (DR) converts queries and documents into dense embeddings and
measures the similarity between queries and documents in vector space. One of the major …

Amd: Automatic multi-step distillation of large-scale vision models

C Han, Q Wang, SA Dianat, M Rabbani… - … on Computer Vision, 2024 - Springer
Transformer-based architectures have become the de-facto standard models for diverse
vision tasks owing to their superior performance. As the size of these transformer-based …

Promptkd: Unsupervised prompt distillation for vision-language models

Z Li, X Li, X Fu, X Zhang, W Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …

MPT: Multimodal Prompt Tuning for Zero-shot Instruction Learning

T Wang, Y Liu, JC Liang, Y Cui, Y Mao, S Nie… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) demonstrate remarkable performance across
a wide range of domains, with increasing emphasis on enhancing their zero-shot …

Efficient multimodal semantic segmentation via dual-prompt learning

S Dong, Y Feng, Q Yang, Y Huang… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org
Multimodal (eg, RGB-Depth/RGB-Thermal) fusion has shown great potential for improving
semantic segmentation in complex scenes (eg, indoor/low-light conditions). Existing …

Visual Fourier Prompt Tuning

R Zeng, C Han, Q Wang, C Wu… - Advances in …, 2025 - proceedings.neurips.cc
With the scale of vision Transformer-based models continuing to grow, finetuning these
large-scale pretrained models for new tasks has become increasingly parameter-intensive …

Be Confident in What You Know: Bayesian Parameter Efficient Fine-Tuning of Vision Foundation Models

D Pandey, S Pyakurel, Q Yu - Advances in Neural …, 2025 - proceedings.neurips.cc
Large transformer-based foundation models have been commonly used as pre-trained
models that can be adapted to different challenging datasets and settings with state-of-the …

Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models

Q Wang, Y Mao, J Wang, H Yu, S Nie… - Proceedings of the …, 2023 - aclanthology.org
With the continuous growth of large language models, the process of fine-tuning these
models for new tasks has become increasingly parameter-intensive. Prompt tuning, a …

Image translation as diffusion visual programmers

C Han, JC Liang, Q Wang, M Rabbani, S Dianat… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image
translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion …