- Academic Search

C Chen, K Shu - AI Magazine, 2024 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Speichern Zitieren Zitiert von: 128 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] oup.com

A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Speichern Zitieren Zitiert von: 159 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arxiv preprint arxiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Speichern Zitieren Zitiert von: 3620 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Speichern Zitieren Zitiert von: 359 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

Speichern Zitieren Zitiert von: 542 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2024 - dl.acm.org

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Speichern Zitieren Zitiert von: 106 Ähnliche Artikel

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

Speichern Zitieren Zitiert von: 227 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Fine-tuning large vision-language models as decision-making agents via reinforcement learning

S Zhai, H Bai, Z Lin, J Pan, P Tong… - Advances in …, 2025 - proceedings.neurips.cc

Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following
data have exhibited impressive language reasoning capabilities across various scenarios …

Speichern Zitieren Zitiert von: 37 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Speichern Zitieren Zitiert von: 162 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Chat-univi: Unified visual representation empowers large language models with image and video understanding

P **, R Takanobu, W Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large language models have demonstrated impressive universal capabilities across a wide
range of open-ended tasks and have extended their utility to encompass multimodal …

Speichern Zitieren Zitiert von: 170 Ähnliche Artikel Alle 4 Versionen HTML-Version

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Combating misinformation in the age of llms: Opportunities and challenges

A Survey of Multimodel Large Language Models

A survey of large language models

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

Eyes wide shut? exploring the visual shortcomings of multimodal llms

Fine-tuning large vision-language models as decision-making agents via reinforcement learning

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

Chat-univi: Unified visual representation empowers large language models with image and video understanding