Deepseek-vl: towards real-world vision-language understanding
We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …
world vision and language understanding applications. Our approach is structured around …
Mme-survey: A comprehensive survey on evaluation of multimodal llms
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …
Models (MLLMs) have garnered increased attention from both industry and academia …
A survey on evaluation of multimodal large language models
J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …
system by integrating powerful Large Language Models (LLMs) with various modality …
A survey on multimodal benchmarks: In the era of large ai models
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …
advancements in artificial intelligence, significantly enhancing the capability to understand …
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers
Generative Pre-trained Transformers (GPTs) have demonstrated remarkable performance
across diverse domains through the extensive scaling of model parameters. Recent works …
across diverse domains through the extensive scaling of model parameters. Recent works …
Training on the Benchmark Is Not All You Need
The success of Large Language Models (LLMs) relies heavily on the huge amount of pre-
training data learned in the pre-training phase. The opacity of the pre-training process and …
training data learned in the pre-training phase. The opacity of the pre-training process and …
Multi-label cluster discrimination for visual representation learning
Abstract Contrastive Language Image Pre-training (CLIP) has recently demonstrated
success across various tasks due to superior feature representation empowered by image …
success across various tasks due to superior feature representation empowered by image …
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Scientific reasoning, the process through which humans apply logic, evidence, and critical
thinking to explore and interpret scientific phenomena, is essential in advancing knowledge …
thinking to explore and interpret scientific phenomena, is essential in advancing knowledge …