- Academic Search

Y Lei, J Li, Z Li, Y Cao, H Shan - Frontiers of Information Technology & …, 2024 - Springer

Prompt learning has attracted broad attention in computer vision since the large pre-trained
vision-language models (VLMs) exploded. Based on the close relationship between vision …

Spara Citera Citerat av 8 Relaterade artiklar Alla 6 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompting language-informed distribution for compositional zero-shot learning

W Bao, L Chen, H Huang, Y Kong - European Conference on Computer …, 2024 - Springer

Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional
visual concepts, eg., sliced tomatoes, where the model is learned only from the seen …

Spara Citera Citerat av 21 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MTA-CLIP: Language-guided semantic segmentation with mask-text alignment

A Das, X Hu, L Jiang, B Schiele - European Conference on Computer …, 2024 - Springer

Recent approaches have shown that large-scale vision-language models such as CLIP can
improve semantic segmentation performance. These methods typically aim for pixel-level …

Spara Citera Citerat av 3 Relaterade artiklar Alla 9 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Improving visual recognition with hyperbolical visual hierarchy map**

H Kwon, J Jang, J Kim, K Kim… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Visual scenes are naturally organized in a hierarchy where a coarse semantic is recursively
comprised of several fine details. Exploring such a visual hierarchy is crucial to recognize …

Spara Citera Citerat av 2 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

CaBins: CLIP-based adaptive bins for monocular depth estimation

E Son, SJ Lee - Proceedings of the IEEE/CVF Conference …, 2024 - openaccess.thecvf.com

Traditional deep-learning models use pre-trained knowledge on large-scale datasets to fine-
tune the model. This strategy significantly improves the performance of downstream tasks …

Spara Citera Citerat av 2 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Image segmentation in foundation model era: A survey

T Zhou, F Zhang, B Chang, W Wang, Y Yuan… - arxiv preprint arxiv …, 2024 - arxiv.org

Image segmentation is a long-standing challenge in computer vision, studied continuously
over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and …

Spara Citera Citerat av 6 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation

Y Wu, M **ng, Y Zhang, Y **e, Y Qu - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Multi-modal Unsupervised Domain Adaptation (MM-UDA) for large-scale 3D semantic
segmentation involves adapting 2D and 3D models to a target domain without labels, which …

Spara Citera Citerat av 3 Relaterade artiklar Alla 2 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] ssrn.com

Multi-modal recursive prompt learning with mixup embedding for generalization recognition

Y Jia, X Ye, Y Liu, S Guo - Knowledge-Based Systems, 2024 - Elsevier

The contrastive language-image pretraining (CLIP) model has shown promise in
generalization recognition by combining visual and textual embeddings. However, the …

Spara Citera Citerat av 3 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text-region matching for multi-label image recognition with missing labels

L Ma, H **e, L Wang, Y Fu, D Sun, H Zhao - Proceedings of the 32nd …, 2024 - dl.acm.org

Recently, large-scale visual language pre-trained (VLP) models have demonstrated
impressive performance across various downstream tasks. Motivated by these …

Spara Citera Citerat av 1 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Task-Conditional Adapter for Multi-Task Dense Prediction

F Jiang, S Wang, X Gong - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org

Multi-task dense prediction plays an important role in the field of computer vision and has an
abundant array of applications. Its main purpose is to reduce the amount of network training …

Spara Citera Citerat av 1 Relaterade artiklar Alla 2 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Probabilistic prompt learning for dense prediction

Prompt learning in computer vision: a survey

Prompting language-informed distribution for compositional zero-shot learning

MTA-CLIP: Language-guided semantic segmentation with mask-text alignment

Improving visual recognition with hyperbolical visual hierarchy map**

CaBins: CLIP-based adaptive bins for monocular depth estimation

Image segmentation in foundation model era: A survey

Clip2uda: Making frozen clip reward unsupervised domain adaptation in 3d semantic segmentation

Multi-modal recursive prompt learning with mixup embedding for generalization recognition

Text-region matching for multi-label image recognition with missing labels

Task-Conditional Adapter for Multi-Task Dense Prediction