Google Acadèmic

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Desa Cita Citat per 8 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

WY Choong, Y Guo, M Kankanhalli - arxiv preprint arxiv:2411.16771, 2024 - arxiv.org

Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …

Desa Cita Articles relacionats Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

G Zhou, W Liu, W Huang, X Jia, X Zhong… - arxiv preprint arxiv …, 2024 - arxiv.org

The lack of occlusion data in commonly used action recognition video datasets limits model
robustness and impedes sustained performance improvements. We construct OccludeNet, a …

Desa Cita Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Mitigating modality prior-induced hallucinations in multimodal large language models via...

Explainable and interpretable multimodal large language models: A comprehensive survey

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions