A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration
Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …
instruction abilities across various open-ended tasks. However previous methods have …
Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …
Blink: Multimodal large language models can see but not perceive
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
When do we not need larger vision models?
Scaling up the size of vision models has been the de facto standard to obtain more powerful
visual representations. In this work, we discuss the point beyond which larger vision models …
visual representations. In this work, we discuss the point beyond which larger vision models …
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models
We propose SPHINX-X, an extensive Multimodality Large Language Model (MLLM) series
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …
developed upon SPHINX. To improve the architecture and training efficiency, we modify the …