Google Tudós

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-langu...

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Mentés Hivatkozás Idézetek száma: 205 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]

[PDF] springer.com

A survey on lora of large language models

Y Mao, Y Ge, Y Fan, W Xu, Y Mi, Z Hu… - Frontiers of Computer …, 2025 - Springer

Abstract Low-Rank Adaptation (LoRA), which updates the dense neural network layers with
pluggable low-rank matrices, is one of the best performed parameter efficient fine-tuning …

Mentés Hivatkozás Idézetek száma: 17 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer

Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

Mentés Hivatkozás Idézetek száma: 726 Kapcsolódó cikkek Mind a(z) 3 változat

[Free GPT-4]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Mentés Hivatkozás Idézetek száma: 351 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

Mentés Hivatkozás Idézetek száma: 534 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Deepseek-vl: towards real-world vision-language understanding

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Mentés Hivatkozás Idézetek száma: 196 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu… - … on Computer Vision, 2024 - Springer

The remarkable progress of Multi-modal Large Language Models (MLLMs) has gained
unparalleled attention. However, their capabilities in visual math problem-solving remain …

Mentés Hivatkozás Idézetek száma: 108 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] arxiv.org

Blink: Multimodal large language models can see but not perceive

X Fu, Y Hu, B Li, Y Feng, H Wang, X Lin, D Roth… - … on Computer Vision, 2024 - Springer

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …

Mentés Hivatkozás Idézetek száma: 95 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] thecvf.com

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Q Huang, X Dong, P Zhang, B Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Hallucination posed as a pervasive challenge of multi-modal large language models
(MLLMs) has significantly impeded their real-world usage that demands precise judgment …

Mentés Hivatkozás Idézetek száma: 134 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …

Mentés Hivatkozás Idézetek száma: 69 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-langu...

Mm-llms: Recent advances in multimodal large language models

A survey on lora of large language models

Mmbench: Is your multi-modal model an all-around player?

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

Deepseek-vl: towards real-world vision-language understanding

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

Blink: Multimodal large language models can see but not perceive

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output