محقق Google

S Zhang, L Dong, X Li, S Zhang, X Sun, S Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

This paper surveys research works in the quickly advancing field of instruction tuning (IT),
which can also be referred to as supervised fine-tuning (SFT)\footnote {In this paper, unless …‏

ذخیره ارجاع بیان شده در 742 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt‏

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024‏ - Springer‏

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …‏

ذخیره ارجاع بیان شده در 611 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Segment anything‏

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …‏

ذخیره ارجاع بیان شده در 8673 یافته مقاله‌های مربوط تمام نسخه‌های 10 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision‏

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …‏

ذخیره ارجاع بیان شده در 2401 یافته مقاله‌های مربوط تمام نسخه‌های 11 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data‏

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …‏

ذخیره ارجاع بیان شده در 611 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)‏

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023‏ - stableaiprompts.com‏

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …‏

ذخیره ارجاع بیان شده در 590 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models‏

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …‏

ذخیره ارجاع بیان شده در 430 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sam 2: Segment anything in images and videos‏

N Ravi, V Gabeur, YT Hu, R Hu, C Ryali, T Ma… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

We present Segment Anything Model 2 (SAM 2), a foundation model towards solving
promptable visual segmentation in images and videos. We build a data engine, which …‏

ذخیره ارجاع بیان شده در 450 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey‏

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024‏ - ieeexplore.ieee.org‏

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …‏

ذخیره ارجاع بیان شده در 462 یافته مقاله‌های مربوط تمام نسخه‌های 11

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Tool learning with foundation models‏

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024‏ - dl.acm.org‏

Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …‏

ذخیره ارجاع بیان شده در 310 یافته مقاله‌های مربوط تمام نسخه‌های 10

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

The cityscapes dataset for semantic urban scene understanding

Instruction tuning for large language models: A survey‏

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt‏

Segment anything‏

Dinov2: Learning robust visual features without supervision‏

Depth anything: Unleashing the power of large-scale unlabeled data‏

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)‏

Open-vocabulary panoptic segmentation with text-to-image diffusion models‏

Sam 2: Segment anything in images and videos‏

Vision-language models for vision tasks: A survey‏

Tool learning with foundation models‏