- Academic Search

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Zapisz Cytuj Cytowane przez 421 Powiązane artykuły Wszystkie wersje 9

[Free GPT-4]

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Zapisz Cytuj Cytowane przez 681 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Visual-language prompt tuning with knowledge-guided context optimization

H Yao, R Zhang, C Xu - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Prompt tuning is an effective way to adapt the pretrained visual-language model (VLM) to
the downstream task using task-related textual tokens. Representative CoOp-based works …

Zapisz Cytuj Cytowane przez 195 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

Zapisz Cytuj Cytowane przez 152 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Generalized out-of-distribution detection and beyond in vision language model era: A survey

A Miyai, J Yang, J Zhang, Y Ming, Y Lin, Q Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine
learning systems and has shaped the field of OOD detection. Meanwhile, several other …

Zapisz Cytuj Cytowane przez 11 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Prompt-aligned gradient for prompt tuning

B Zhu, Y Niu, Y Han, Y Wu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a
zero-shot classifier by discrete prompt design, eg, the confidence score of an image …

Zapisz Cytuj Cytowane przez 290 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] thecvf.com

Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models

Z Lin, S Yu, Z Kuang, D Pathak… - Proceedings of the …, 2023 - openaccess.thecvf.com

The ability to quickly learn a new task with minimal instruction-known as few-shot learning-is
a central aspect of intelligent agents. Classical few-shot benchmarks make use of few-shot …

Zapisz Cytuj Cytowane przez 111 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models

G Zheng, B Yang, J Tang, HY Zhou… - Advances in Neural …, 2023 - proceedings.neurips.cc

A long-standing goal of AI systems is to perform complex multimodal reasoning like humans.
Recently, large language models (LLMs) have made remarkable strides in such multi-step …

Zapisz Cytuj Cytowane przez 79 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Cheap and quick: Efficient vision-language instruction tuning for large language models

G Luo, Y Zhou, T Ren, S Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recently, growing interest has been aroused in extending the multimodal capability of large
language models (LLMs), eg, vision-language (VL) learning, which is regarded as the next …

Zapisz Cytuj Cytowane przez 110 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Neural prompt search

Y Zhang, K Zhou, Z Liu - arxiv preprint arxiv:2206.04673, 2022 - arxiv.org

The size of vision models has grown exponentially over the last few years, especially after
the emergence of Vision Transformer. This has motivated the development of parameter …

Zapisz Cytuj Cytowane przez 182 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Prompt distribution learning

Vision-language models for vision tasks: A survey

Maple: Multi-modal prompt learning

Visual-language prompt tuning with knowledge-guided context optimization

Self-regulating prompts: Foundational model adaptation without forgetting

Generalized out-of-distribution detection and beyond in vision language model era: A survey

Prompt-aligned gradient for prompt tuning

Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models

Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models

Cheap and quick: Efficient vision-language instruction tuning for large language models

Neural prompt search