Google Tudós

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Mentés Hivatkozás Idézetek száma: 351 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Mentés Hivatkozás Idézetek száma: 44 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Deepseek-vl: towards real-world vision-language understanding

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Mentés Hivatkozás Idézetek száma: 196 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images

Z Guo, R Xu, Y Yao, J Cui, Z Ni, C Ge, TS Chua… - … on Computer Vision, 2024 - Springer

Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …

Mentés Hivatkozás Idézetek száma: 93 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] arxiv.org

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …

Mentés Hivatkozás Idézetek száma: 69 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

X Dong, P Zhang, Y Zang, Y Cao, B Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-
form text-image composition and comprehension. This model goes beyond conventional …

Mentés Hivatkozás Idézetek száma: 216 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[HTML] jmir.org

[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

R AlSaad, A Abd-Alrazaq, S Boughorbel… - Journal of medical …, 2024 - jmir.org

In the complex and multidimensional field of medicine, multimodal data are prevalent and
crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types …

Mentés Hivatkozás Idézetek száma: 18 Kapcsolódó cikkek Mind a(z) 9 változat Tárolt változat

[Free GPT-4]

[PDF] arxiv.org

Kosmos-2.5: A multimodal literate model

T Lv, Y Huang, J Chen, Y Zhao, Y Jia, L Cui… - arxiv preprint arxiv …, 2023 - arxiv.org

The automatic reading of text-intensive images represents a significant advancement toward
achieving Artificial General Intelligence (AGI). In this paper we present KOSMOS-2.5, a …

Mentés Hivatkozás Idézetek száma: 50 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

Mentés Hivatkozás Idézetek száma: 34 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Mobilevlm v2: Faster and stronger baseline for vision language model

X Chu, L Qiao, X Zhang, S Xu, F Wei, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce MobileVLM V2, a family of significantly improved vision language models
upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an …

Mentés Hivatkozás Idézetek száma: 90 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Vary: Scaling up the vision vocabulary for large vision-language model

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Knowledge graphs meet multi-modal learning: A comprehensive survey

Deepseek-vl: towards real-world vision-language understanding

Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

Kosmos-2.5: A multimodal literate model

A survey of multimodal large language model from a data-centric perspective

Mobilevlm v2: Faster and stronger baseline for vision language model