- Academic Search

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org

This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Gem Citer Citeret af 130 Relaterede artikler Alle 9 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mantis: Interleaved multi-image instruction tuning

D Jiang, X He, H Zeng, C Wei, M Ku, Q Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large multimodal models (LMMs) have shown great results in single-image vision language
tasks. However, their abilities to solve multi-image visual language tasks is yet to be …

Gem Citer Citeret af 76 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark

D Chen, R Chen, S Zhang, Y Wang, Y Liu… - … on Machine Learning, 2024 - openreview.net

Multimodal Large Language Models (MLLMs) have gained significant attention recently,
showing remarkable potential in artificial general intelligence. However, assessing the utility …

Gem Citer Citeret af 52 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Embodied multi-modal agent trained by an llm from a parallel textworld

Y Yang, T Zhou, K Li, D Tao, L Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

While large language models (LLMs) excel in a simulated world of texts they struggle to
interact with the more realistic world without perceptions of other modalities such as visual or …

Gem Citer Citeret af 33 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[HTML] jmir.org

[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

R AlSaad, A Abd-Alrazaq, S Boughorbel… - Journal of medical …, 2024 - jmir.org

In the complex and multidimensional field of medicine, multimodal data are prevalent and
crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types …

Gem Citer Citeret af 22 Relaterede artikler Alle 9 versioner Cached

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Gem Citer Citeret af 37 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Halc: Object hallucination reduction via adaptive focal-contrast decoding

Z Chen, Z Zhao, H Luo, H Yao, B Li, J Zhou - arxiv preprint arxiv …, 2024 - arxiv.org

While large vision-language models (LVLMs) have demonstrated impressive capabilities in
interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We …

Gem Citer Citeret af 51 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vhelm: A holistic evaluation of vision language models

T Lee, H Tu, CH Wong, W Zheng… - Advances in …, 2025 - proceedings.neurips.cc

Current benchmarks for assessing vision-language models (VLMs) often focus on their
perception or problem-solving capabilities and neglect other critical aspects such as …

Gem Citer Citeret af 7 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Gem Citer Citeret af 9 Relaterede artikler Alle 2 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mitigating object hallucination via concentric causal attention

Y **ng, Y Li, I Laptev, S Lu - Advances in Neural …, 2025 - proceedings.neurips.cc

Abstract Recent Large Vision Language Models (LVLMs) present remarkable zero-shot
conversational and reasoning capabilities given multimodal queries. Nevertheless, they …

Gem Citer Citeret af 5 Relaterede artikler Alle 4 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Mementos: A comprehensive benchmark for multimodal large language model reasoning over image...

Datasets for large language models: A comprehensive survey

Mantis: Interleaved multi-image instruction tuning

Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark

Embodied multi-modal agent trained by an llm from a parallel textworld

[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Halc: Object hallucination reduction via adaptive focal-contrast decoding

Vhelm: A holistic evaluation of vision language models

Mme-survey: A comprehensive survey on evaluation of multimodal llms

Mitigating object hallucination via concentric causal attention