A survey on benchmarks of multimodal large language models
Multimodal Large Language Models (MLLMs) are gaining increasing popularity in both
academia and industry due to their remarkable performance in various applications such as …
academia and industry due to their remarkable performance in various applications such as …
On-device language models: A comprehensive review
The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …
applications, and running LLMs on edge devices has become increasingly attractive for …
Video-mme: The first-ever comprehensive evaluation benchmark of multi-modal llms in video analysis
In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs)
have emerged as a focal point in recent advancements. However, the predominant focus …
have emerged as a focal point in recent advancements. However, the predominant focus …
Chat-univi: Unified visual representation empowers large language models with image and video understanding
Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …
range of open-ended tasks and have extended their utility to encompass multimodal …
Lita: Language instructed temporal-localization assistant
There has been tremendous progress in multimodal Large Language Models (LLMs).
Recent works have extended these models to video input with promising instruction …
Recent works have extended these models to video input with promising instruction …
Video understanding with large language models: A survey
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …
content, the demand for proficient video understanding tools has intensified markedly. Given …
Kangaroo: A powerful video-language model supporting long-context video input
Rapid advancements have been made in extending Large Language Models (LLMs) to
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
Large Multi-modal Models (LMMs). However, extending input modality of LLMs to video data …
Lvbench: An extreme long video understanding benchmark
Recent progress in multimodal large language models has markedly enhanced the
understanding of short videos (typically under one minute), and several evaluation datasets …
understanding of short videos (typically under one minute), and several evaluation datasets …
Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the
performance of vision large language models (VLLMs), existing visual instruction tuning …
performance of vision large language models (VLLMs), existing visual instruction tuning …
Longvu: Spatiotemporal adaptive compression for long video-language understanding
Multimodal Large Language Models (MLLMs) have shown promising progress in
understanding and analyzing video content. However, processing long videos remains a …
understanding and analyzing video content. However, processing long videos remains a …