Google Tudós

H Lu, W Liu, B Zhang, B Wang, K Dong, B Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-
world vision and language understanding applications. Our approach is structured around …

Mentés Hivatkozás Idézetek száma: 201 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Mentés Hivatkozás Idézetek száma: 5 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Mentés Hivatkozás Idézetek száma: 17 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org

The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Mentés Hivatkozás Idézetek száma: 4 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers

Z Pei, HL Zhen, X Yu, SJ Pan, M Yuan, B Yu - arxiv preprint arxiv …, 2024 - arxiv.org

Generative Pre-trained Transformers (GPTs) have demonstrated remarkable performance
across diverse domains through the extensive scaling of model parameters. Recent works …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Training on the Benchmark Is Not All You Need

S Ni, X Kong, C Li, X Hu, R Xu, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

The success of Large Language Models (LLMs) relies heavily on the huge amount of pre-
training data learned in the pre-training phase. The opacity of the pre-training process and …

Mentés Hivatkozás Idézetek száma: 1 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-label cluster discrimination for visual representation learning

X An, K Yang, X Dai, Z Feng, J Deng - European Conference on Computer …, 2024 - Springer

Abstract Contrastive Language Image Pre-training (CLIP) has recently demonstrated
success across various tasks due to superior feature representation empowered by image …

Mentés Hivatkozás Kapcsolódó cikkek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning

Y Yan, S Wang, J Huo, J Ye, Z Chu, X Hu… - arxiv preprint arxiv …, 2025 - arxiv.org

Scientific reasoning, the process through which humans apply logic, evidence, and critical
thinking to explore and interpret scientific phenomena, is essential in advancing knowledge …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Cmmmu: A chinese massive multi-discipline multimodal understanding benchmark

Deepseek-vl: towards real-world vision-language understanding

Mme-survey: A comprehensive survey on evaluation of multimodal llms

A survey on evaluation of multimodal large language models

A survey on multimodal benchmarks: In the era of large ai models

FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers

Training on the Benchmark Is Not All You Need

Multi-label cluster discrimination for visual representation learning

Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning