Академия Google

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Сохранить Цитировать Цитируется: 361 Похожие статьи Все версии статьи (2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Сохранить Цитировать Цитируется: 28 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity

Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang… - Science China …, 2024 - Springer

Despite the effectiveness of vision-language supervised fine-tuning in enhancing the
performance of vision large language models (VLLMs), existing visual instruction tuning …

Сохранить Цитировать Цитируется: 15 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Densefusion-1m: Merging vision experts for comprehensive multimodal perception

X Li, F Zhang, H Diao, Y Wang, X Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing Multimodal Large Language Models (MLLMs) increasingly emphasize complex
understanding of various visual elements, including multiple objects, text information, and …

Сохранить Цитировать Цитируется: 18 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Сохранить Цитировать Цитируется: 17 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding

Z Wu, X Chen, Z Pan, X Liu, W Liu, D Dai… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL2, an advanced series of large Mixture-of-Experts (MoE) Vision-
Language Models that significantly improves upon its predecessor, DeepSeek-VL, through …

Сохранить Цитировать Цитируется: 8 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian… - Visual Intelligence, 2024 - Springer

Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …

Сохранить Цитировать Цитируется: 8 Похожие статьи Все версии статьи (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

W Wang, Z Chen, W Wang, Y Cao, Y Liu, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Existing open-source multimodal large language models (MLLMs) generally follow a
training process involving pre-training and supervised fine-tuning. However, these models …

Сохранить Цитировать Цитируется: 6 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The curse of multi-modalities: Evaluating hallucinations of large multimodal models across language, visual, and audio

S Leng, Y **ng, Z Cheng, Y Zhou, H Zhang, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in large multimodal models (LMMs) have significantly enhanced
performance across diverse tasks, with ongoing efforts to further integrate additional …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org

The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (2) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

The all-seeing project v2: Towards general relation comprehension of the open world

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity

Densefusion-1m: Merging vision experts for comprehensive multimodal perception

A survey on evaluation of multimodal large language models

Deepseek-vl2: Mixture-of-experts vision-language models for advanced multimodal understanding

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Enhancing the reasoning ability of multimodal large language models via mixed preference optimization

The curse of multi-modalities: Evaluating hallucinations of large multimodal models across language, visual, and audio

A survey on multimodal benchmarks: In the era of large ai models