محقق Google

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024‏ - dl.acm.org‏

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …‏

ذخیره ارجاع بیان شده در 1236 یافته مقاله‌های مربوط تمام نسخه‌های 12

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4‏

KS Kalyan - Natural Language Processing Journal, 2024‏ - Elsevier‏

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …‏

ذخیره ارجاع بیان شده در 274 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen technical report‏

J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …‏

ذخیره ارجاع بیان شده در 2482 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] stableaiprompts.com

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)‏

Z Yang, L Li, K Lin, J Wang, CC Lin… - arxiv preprint arxiv …, 2023‏ - stableaiprompts.com‏

Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory
skills, such as visual understanding, to achieve stronger generic intelligence. In this paper …‏

ذخیره ارجاع بیان شده در 594 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mm-vet: Evaluating large multimodal models for integrated capabilities‏

W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

We propose MM-Vet, an evaluation benchmark that examines large multimodal models
(LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing …‏

ذخیره ارجاع بیان شده در 533 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vipergpt: Visual inference via python execution for reasoning‏

D Surís, S Menon, C Vondrick - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com‏

Answering visual queries is a complex task that requires both visual processing and
reasoning. End-to-end models, the dominant approach for this task, do not explicitly …‏

ذخیره ارجاع بیان شده در 426 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants‏

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024‏ - nowpublishers.com‏

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …‏

ذخیره ارجاع بیان شده در 229 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual chatgpt: Talking, drawing and editing with visual foundation models‏

C Wu, S Yin, W Qi, X Wang, Z Tang, N Duan - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

ChatGPT is attracting a cross-field interest as it provides a language interface with
remarkable conversational competency and reasoning capabilities across many domains …‏

ذخیره ارجاع بیان شده در 673 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Obelics: An open web-scale filtered dataset of interleaved image-text documents‏

H Laurençon, L Saulnier, L Tronchon… - Advances in …, 2023‏ - proceedings.neurips.cc‏

Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …‏

ذخیره ارجاع بیان شده در 258 یافته مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Visual programming: Compositional visual reasoning without training‏

T Gupta, A Kembhavi - … of the IEEE/CVF Conference on …, 2023‏ - openaccess.thecvf.com‏

We present VISPROG, a neuro-symbolic approach to solving complex and compositional
visual tasks given natural language instructions. VISPROG avoids the need for any task …‏

ذخیره ارجاع بیان شده در 416 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

An empirical study of gpt-3 for few-shot knowledge-based vqa

A Survey of Multimodel Large Language Models‏

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4‏

Qwen technical report‏

[PDF][PDF] The dawn of lmms: Preliminary explorations with gpt-4v (ision)‏

Mm-vet: Evaluating large multimodal models for integrated capabilities‏

Vipergpt: Visual inference via python execution for reasoning‏

Multimodal foundation models: From specialists to general-purpose assistants‏

Visual chatgpt: Talking, drawing and editing with visual foundation models‏

Obelics: An open web-scale filtered dataset of interleaved image-text documents‏

Visual programming: Compositional visual reasoning without training‏