Google Tudós

Z Zhao, Y Liu, H Wu, M Wang, Y Li, S Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Contrastive Language-Image Pre-training (CLIP), a simple yet effective pre-training
paradigm, successfully introduces text supervision to vision models. It has shown promising …

Mentés Hivatkozás Idézetek száma: 51 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Video-chatgpt: Towards detailed video understanding via large vision and language models

M Maaz, H Rasheed, S Khan, FS Khan - arxiv preprint arxiv:2306.05424, 2023 - arxiv.org

Conversation agents fueled by Large Language Models (LLMs) are providing a new way to
interact with visual data. While there have been initial attempts for image-based …

Mentés Hivatkozás Idézetek száma: 573 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Mentés Hivatkozás Idézetek száma: 428 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Mentés Hivatkozás Idézetek száma: 683 Kapcsolódó cikkek Mind a(z) 10 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

Mentés Hivatkozás Idézetek száma: 153 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Aligning bag of regions for open-vocabulary object detection

S Wu, W Zhang, S **, W Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-trained vision-language models (VLMs) learn to align vision and language
representations on large-scale datasets, where each image-text pair usually contains a bag …

Mentés Hivatkozás Idézetek száma: 114 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Pla: Language-driven open-vocabulary 3d scene understanding

R Ding, J Yang, C Xue, W Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Open-vocabulary scene understanding aims to localize and recognize unseen categories
beyond the annotated label space. The recent breakthrough of 2D open-vocabulary …

Mentés Hivatkozás Idézetek száma: 143 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Fine-tuned clip models are efficient video learners

H Rasheed, MU Khattak, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP
model. Since training on a similar scale for videos is infeasible, recent approaches focus on …

Mentés Hivatkozás Idézetek száma: 159 Kapcsolódó cikkek Mind a(z) 9 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

X Wu, F Zhu, R Zhao, H Li - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects
from novel categories beyond the base categories on which the detector is trained. Recent …

Mentés Hivatkozás Idézetek száma: 128 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Contextual object detection with multimodal large language models

Y Zang, W Li, J Han, K Zhou, CC Loy - International Journal of Computer …, 2024 - Springer

Abstract Recent Multimodal Large Language Models (MLLMs) are remarkable in vision-
language tasks, such as image captioning and question answering, but lack the essential …

Mentés Hivatkozás Idézetek száma: 79 Kapcsolódó cikkek Mind a(z) 2 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Bridging the gap between object and image-level representations for open-vocabulary detection

Clip in medical imaging: A comprehensive survey

Video-chatgpt: Towards detailed video understanding via large vision and language models

Vision-language models for vision tasks: A survey

Maple: Multi-modal prompt learning

Self-regulating prompts: Foundational model adaptation without forgetting

Aligning bag of regions for open-vocabulary object detection

Pla: Language-driven open-vocabulary 3d scene understanding

Fine-tuned clip models are efficient video learners

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

Contextual object detection with multimodal large language models