- Academic Search

M Abdin, J Aneja, H Awadalla, A Awadallah… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …

Salva Cita Citato da 745 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Salva Cita Citato da 350 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Minicpm-v: A gpt-4v level mllm on your phone

Y Yao, T Yu, A Zhang, C Wang, J Cui, H Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …

Salva Cita Citato da 181 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images

Z Guo, R Xu, Y Yao, J Cui, Z Ni, C Ge, TS Chua… - … on Computer Vision, 2024 - Springer

Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …

Salva Cita Citato da 92 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] arxiv.org

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org

We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …

Salva Cita Citato da 68 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org

We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

Salva Cita Citato da 37 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[HTML] mdpi.com

[HTML][HTML] A survey of robot intelligence with large language models

H Jeong, H Lee, C Kim, S Shin - Applied Sciences, 2024 - mdpi.com

Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …

Salva Cita Citato da 7 Articoli correlati Tutte e 2 le versioni Copia cache

[Free GPT-4]

[PDF] arxiv.org

Eagle: Exploring the design space for multimodal llms with mixture of encoders

M Shi, F Liu, S Wang, S Liao, S Radhakrishnan… - arxiv preprint arxiv …, 2024 - arxiv.org

The ability to accurately interpret complex visual information is a crucial topic of multimodal
large language models (MLLMs). Recent work indicates that enhanced visual perception …

Salva Cita Citato da 42 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Lmms-eval: Reality check on the evaluation of large multimodal models

K Zhang, B Li, P Zhang, F Pu, JA Cahyono… - arxiv preprint arxiv …, 2024 - arxiv.org

The advances of large foundation models necessitate wide-coverage, low-cost, and zero-
contamination benchmarks. Despite continuous exploration of language model evaluations …

Salva Cita Citato da 28 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding

T Zhang, X Li, H Fei, H Yuan, S Wu, S Ji… - arxiv preprint arxiv …, 2024 - arxiv.org

Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …

Salva Cita Citato da 29 Articoli correlati Tutte e 4 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from...

Phi-3 technical report: A highly capable language model locally on your phone

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Minicpm-v: A gpt-4v level mllm on your phone

Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images

Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output

Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

[HTML][HTML] A survey of robot intelligence with large language models

Eagle: Exploring the design space for multimodal llms with mixture of encoders

Lmms-eval: Reality check on the evaluation of large multimodal models

Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding