Multi-object hallucination in vision-language models
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
[PDF][PDF] Mitigating Language Bias of LMMs in Social Intelligence Understanding with Virtual Counterfactual Calibration
Social intelligence is essential for understanding complex human expressions and social
interactions. While large multimodal models (LMMs) have demonstrated remarkable …
interactions. While large multimodal models (LMMs) have demonstrated remarkable …