- Academic Search

H Zhang, M Gao, Z Gan, P Dufter, N Wenzel… - ar** a unified model for both short and long video …

Speichern Zitieren Zitiert von: 3 Ähnliche Artikel Alle 2 Versionen HTML-Version

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

K Gong, K Feng, B Li, Y Wang, M Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, multimodal large language models (MLLMs), such as GPT-4o, Gemini 1.5 Pro,
and Reka Core, have expanded their capabilities to include vision and audio modalities …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

J Chen, T Zhang, S Huang, Y Niu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Despite the recent breakthroughs achieved by Large Vision Language Models (LVLMs) in
understanding and responding to complex visual-textual contexts, their inherent …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals

MHJ Lee, S Jeon - arxiv preprint arxiv:2412.09668, 2024 - arxiv.org

Vision-Language Models (VLMs) combine Large Language Model (LLM) capabilities with
image processing, enabling tasks like image captioning and text-to-image generation. Yet …

Speichern Zitieren Zitiert von: 2 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] aclanthology.org

Self-Training Large Language and Vision Assistant for Medical Question Answering

G Sun, C Qin, H Fu, L Wang, Z Tao - Proceedings of the 2024 …, 2024 - aclanthology.org

Abstract Large Vision-Language Models (LVLMs) have shown significant potential in
assisting medical diagnosis by leveraging extensive biomedical datasets. However, the …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models

S Qian, Z Zhou, D Xue, B Wang, C Xu - arxiv preprint arxiv:2409.18996, 2024 - arxiv.org

Cross-modal reasoning (CMR), the intricate process of synthesizing and drawing inferences
across divergent sensory modalities, is increasingly recognized as a crucial capability in the …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 3 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

xgen-mm (blip-3): A family of open large multimodal models

Mm1. 5: Methods, analysis & insights from multimodal llm fine-tuning

AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals

Self-Training Large Language and Vision Assistant for Medical Question Answering

From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models