Google Académico

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Guardar Citar Citado por 205 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Guardar Citar Citado por 153 Artículos relacionados Las 7 versiones

[Free GPT-4]

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Guardar Citar Citado por 3554 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer

Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

Guardar Citar Citado por 722 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Guardar Citar Citado por 1755 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024 - Springer

Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

Guardar Citar Citado por 446 Artículos relacionados Las 3 versiones

[Free GPT-4]

[PDF] arxiv.org

Qwen2. 5 technical report

A Yang, B Yang, B Zhang, B Hui, B Zheng, B Yu… - arxiv preprint arxiv …, 2024 - arxiv.org

In this report, we introduce Qwen2. 5, a comprehensive series of large language models
(LLMs) designed to meet diverse needs. Compared to previous iterations, Qwen 2.5 has …

Guardar Citar Citado por 823 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Guardar Citar Citado por 350 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

MM1: methods, analysis and insights from multimodal LLM pre-training

B McKinzie, Z Gan, JP Fauconnier, S Dodge… - … on Computer Vision, 2024 - Springer

In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …

Guardar Citar Citado por 180 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

Evaluating object hallucination in large vision-language models

Y Li, Y Du, K Zhou, J Wang, WX Zhao… - arxiv preprint arxiv …, 2023 - arxiv.org

Inspired by the superior language abilities of large language models (LLM), large vision-
language models (LVLM) have been recently explored by integrating powerful LLMs for …

Guardar Citar Citado por 735 Artículos relacionados Las 6 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Qwen-vl: A frontier large vision-language model with versatile abilities

Mm-llms: Recent advances in multimodal large language models

A Survey of Multimodel Large Language Models

A survey of large language models

Mmbench: Is your multi-modal model an all-around player?

Improved baselines with visual instruction tuning

Sharegpt4v: Improving large multi-modal models with better captions

Qwen2. 5 technical report

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

MM1: methods, analysis and insights from multimodal LLM pre-training

Evaluating object hallucination in large vision-language models