Mme-survey: A comprehensive survey on evaluation of multimodal llms
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …
Models (MLLMs) have garnered increased attention from both industry and academia …
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
Expanding the long-context capabilities of Multi-modal Large Language Models~(MLLMs) is
crucial for video understanding, high-resolution image understanding, and multi-modal …
crucial for video understanding, high-resolution image understanding, and multi-modal …
Video-xl: Extra-long vision language model for hour-scale video understanding
Although current Multi-modal Large Language Models (MLLMs) demonstrate promising
results in video understanding, processing extremely long videos remains an ongoing …
results in video understanding, processing extremely long videos remains an ongoing …
A survey on multimodal benchmarks: In the era of large ai models
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …
advancements in artificial intelligence, significantly enhancing the capability to understand …
Videochat-flash: Hierarchical compression for long-context video modeling
Long-context modeling is a critical capability for multimodal large language models
(MLLMs), enabling them to process long-form contents with implicit memorization. Despite …
(MLLMs), enabling them to process long-form contents with implicit memorization. Despite …
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Large Multimodal Models (LMMs) have made significant strides in visual question-
answering for single images. Recent advancements like long-context LMMs have allowed …
answering for single images. Recent advancements like long-context LMMs have allowed …
InternVideo2. 5: Empowering Video MLLMs with Long and Rich Context Modeling
This paper aims to improve the performance of video multimodal large language models
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
(MLLM) via long and rich context (LRC) modeling. As a result, we develop a new version of …
VCBench: A Controllable Benchmark for Symbolic and Abstract Challenges in Video Cognition
Recent advancements in Large Video-Language Models (LVLMs) have driven the
development of benchmarks designed to assess cognitive abilities in video-based tasks …
development of benchmarks designed to assess cognitive abilities in video-based tasks …