الباحث العلمي من Google

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024‏ - Elsevier‏

The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …‏

حفظ اقتباس تم اقتباسها في عدد: 148 مقالات ذات صلة الإصدارات الـ 7كلها

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends‏

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022‏ - nowpublishers.com‏

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …‏

حفظ اقتباس تم اقتباسها في عدد: 197 مقالات ذات صلة الإصدارات الـ 7كلها بحث عن المكتبات إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Visual instruction tuning‏

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024‏ - proceedings.neurips.cc‏

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …‏

حفظ اقتباس تم اقتباسها في عدد: 5032 مقالات ذات صلة الإصدارات الـ 15كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Gemini: a family of highly capable multimodal models‏

G Team, R Anil, S Borgeaud, JB Alayrac, J Yu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …‏

حفظ اقتباس تم اقتباسها في عدد: 2488 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

A survey of large language models‏

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …‏

حفظ اقتباس تم اقتباسها في عدد: 3574 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]

[PDF] researchhub.com

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond‏

J Bai, S Bai, S Yang, S Wang… - arxiv preprint …, 2023‏ - storage.prod.researchhub.com‏

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …‏

حفظ اقتباس تم اقتباسها في عدد: 547 مقالات ذات صلة إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context‏

G Team, P Georgiev, VI Lei, R Burnell, L Bai… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

In this report, we introduce the Gemini 1.5 family of models, representing the next generation
of highly compute-efficient multimodal models capable of recalling and reasoning over fine …‏

حفظ اقتباس تم اقتباسها في عدد: 945 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks‏

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024‏ - proceedings.neurips.cc‏

Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …‏

حفظ اقتباس تم اقتباسها في عدد: 447 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]

[PDF] neurips.cc

Language is not all you need: Aligning perception with language models‏

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2023‏ - proceedings.neurips.cc‏

A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …‏

حفظ اقتباس تم اقتباسها في عدد: 480 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

Qwen-vl: A frontier large vision-language model with versatile abilities‏

J Bai, S Bai, S Yang, S Wang, S Tan, P Wang… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …‏

حفظ اقتباس تم اقتباسها في عدد: 1013 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Cider: Consensus-based image description evaluation

Advances in medical image analysis with vision transformers: a comprehensive review‏

Vision-language pre-training: Basics, recent advances, and future trends‏

Visual instruction tuning‏

Gemini: a family of highly capable multimodal models‏

A survey of large language models‏

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond‏

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context‏

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks‏

Language is not all you need: Aligning perception with language models‏

Qwen-vl: A frontier large vision-language model with versatile abilities‏