Prompt learning in computer vision: a survey

Y Lei, J Li, Z Li, Y Cao, H Shan - Frontiers of Information Technology & …, 2024 - Springer
Prompt learning has attracted broad attention in computer vision since the large pre-trained
vision-language models (VLMs) exploded. Based on the close relationship between vision …

Prompting language-informed distribution for compositional zero-shot learning

W Bao, L Chen, H Huang, Y Kong - European Conference on Computer …, 2024 - Springer
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional
visual concepts, eg., sliced tomatoes, where the model is learned only from the seen …

MTA-CLIP: Language-guided semantic segmentation with mask-text alignment

A Das, X Hu, L Jiang, B Schiele - European Conference on Computer …, 2024 - Springer
Recent approaches have shown that large-scale vision-language models such as CLIP can
improve semantic segmentation performance. These methods typically aim for pixel-level …

Improving visual recognition with hyperbolical visual hierarchy map**

H Kwon, J Jang, J Kim, K Kim… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Visual scenes are naturally organized in a hierarchy where a coarse semantic is recursively
comprised of several fine details. Exploring such a visual hierarchy is crucial to recognize …

CaBins: CLIP-based adaptive bins for monocular depth estimation

E Son, SJ Lee - Proceedings of the IEEE/CVF Conference …, 2024 - openaccess.thecvf.com
Traditional deep-learning models use pre-trained knowledge on large-scale datasets to fine-
tune the model. This strategy significantly improves the performance of downstream tasks …

Image segmentation in foundation model era: A survey

T Zhou, F Zhang, B Chang, W Wang, Y Yuan… - arxiv preprint arxiv …, 2024 - arxiv.org
Image segmentation is a long-standing challenge in computer vision, studied continuously
over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and …

Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation

Y Wu, M **ng, Y Zhang, Y **e, Y Qu - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Multi-modal Unsupervised Domain Adaptation (MM-UDA) for large-scale 3D semantic
segmentation involves adapting 2D and 3D models to a target domain without labels, which …

Multi-modal recursive prompt learning with mixup embedding for generalization recognition

Y Jia, X Ye, Y Liu, S Guo - Knowledge-Based Systems, 2024 - Elsevier
The contrastive language-image pretraining (CLIP) model has shown promise in
generalization recognition by combining visual and textual embeddings. However, the …

Text-region matching for multi-label image recognition with missing labels

L Ma, H **e, L Wang, Y Fu, D Sun, H Zhao - Proceedings of the 32nd …, 2024 - dl.acm.org
Recently, large-scale visual language pre-trained (VLP) models have demonstrated
impressive performance across various downstream tasks. Motivated by these …

Task-Conditional Adapter for Multi-Task Dense Prediction

F Jiang, S Wang, X Gong - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Multi-task dense prediction plays an important role in the field of computer vision and has an
abundant array of applications. Its main purpose is to reduce the amount of network training …