- Academic Search

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Uložit Citovat Počet citací tohoto článku: 7 Související články Všechny verze (počet: 2) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models

K Zheng, J Chen, Y Yan, X Zou, X Hu - arxiv preprint arxiv:2408.09429, 2024 - arxiv.org

Hallucination issues persistently plagued current multimodal large language models
(MLLMs). While existing research primarily focuses on object-level or attribute-level …

Uložit Citovat Počet citací tohoto článku: 7 Související články Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Miner: Mining the underlying pattern of modality-specific neurons in multimodal large language models

K Huang, J Huo, Y Yan, K Wang, Y Yue… - arxiv preprint arxiv …, 2024 - arxiv.org

In recent years, multimodal large language models (MLLMs) have significantly advanced,
integrating more modalities into diverse applications. However, the lack of explainability …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality

G Zhou, Y Yan, X Zou, K Wang, A Liu, X Hu - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have emerged as a central focus in both
industry and academia, but often suffer from biases introduced by visual and language …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org

Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …

Uložit Citovat Související články Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models

I Cohen, D Gottesman, M Geva, R Giryes - arxiv preprint arxiv:2412.14133, 2024 - arxiv.org

Vision-language models (VLMs) excel at extracting and reasoning about information from
images. Yet, their capacity to leverage internal knowledge about specific entities remains …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model

Explainable and interpretable multimodal large language models: A comprehensive survey

Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models

Miner: Mining the underlying pattern of modality-specific neurons in multimodal large language models

Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models