Phi-3 technical report: A highly capable language model locally on your phone
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion
tokens, whose overall performance, as measured by both academic benchmarks and …
tokens, whose overall performance, as measured by both academic benchmarks and …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …
Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images
Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …
Internlm-xcomposer-2.5: A versatile large vision language model supporting long-contextual input and output
P Zhang, X Dong, Y Zang, Y Cao, R Qian… - arxiv preprint arxiv …, 2024 - arxiv.org
We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that
supports long-contextual input and output. IXC-2.5 excels in various text-image …
supports long-contextual input and output. IXC-2.5 excels in various text-image …
Vlmevalkit: An open-source toolkit for evaluating large multi-modality models
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …
[HTML][HTML] A survey of robot intelligence with large language models
Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …
Eagle: Exploring the design space for multimodal llms with mixture of encoders
The ability to accurately interpret complex visual information is a crucial topic of multimodal
large language models (MLLMs). Recent work indicates that enhanced visual perception …
large language models (MLLMs). Recent work indicates that enhanced visual perception …
Lmms-eval: Reality check on the evaluation of large multimodal models
The advances of large foundation models necessitate wide-coverage, low-cost, and zero-
contamination benchmarks. Despite continuous exploration of language model evaluations …
contamination benchmarks. Despite continuous exploration of language model evaluations …
Omg-llava: Bridging image-level, object-level, pixel-level reasoning and understanding
Current universal segmentation methods demonstrate strong capabilities in pixel-level
image and video understanding. However, they lack reasoning abilities and cannot be …
image and video understanding. However, they lack reasoning abilities and cannot be …