Knowledge editing for large language models: A survey
Large Language Models (LLMs) have recently transformed both the academic and industrial
landscapes due to their remarkable capacity to understand, analyze, and generate texts …
landscapes due to their remarkable capacity to understand, analyze, and generate texts …
A Survey of Multimodel Large Language Models
Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …
including vision, the technology of large language models is evolving from a single modality …
How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi
We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
MM1: methods, analysis and insights from multimodal LLM pre-training
In this work, we discuss building performant Multimodal Large Language Models (MLLMs).
In particular, we study the importance of various architecture components and data choices …
In particular, we study the importance of various architecture components and data choices …
Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Cogvlm: Visual expert for pretrained language models
We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …
from the popular shallow alignment method which maps image features into the input space …
Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?
The remarkable progress of Multi-modal Large Language Models (MLLMs) has gained
unparalleled attention. However, their capabilities in visual math problem-solving remain …
unparalleled attention. However, their capabilities in visual math problem-solving remain …
Llava-uhd: an lmm perceiving any aspect ratio and high-resolution images
Visual encoding constitutes the basis of large multimodal models (LMMs) in understanding
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …
the visual world. Conventional LMMs process images in fixed sizes and limited resolutions …
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models
Visual instruction tuning has made considerable strides in enhancing the capabilities of
Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single …
Large Multimodal Models (LMMs). However, existing open LMMs largely focus on single …