Exploring the reasoning abilities of multimodal large language models (mllms): A comprehensive survey on emerging trends in multimodal reasoning

Y Wang, W Chen, X Han, X Lin, H Zhao, Y Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Strong Artificial Intelligence (Strong AI) or Artificial General Intelligence (AGI) with abstract
reasoning ability is the goal of next-generation AI. Recent advancements in Large Language …

Mminstruct: A high-quality multi-modal instruction tuning dataset with extensive diversity

Y Liu, Y Cao, Z Gao, W Wang, Z Chen, W Wang… - Science China …, 2024 - Springer
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the
performance of vision large language models (VLLMs), existing visual instruction tuning …

Investigating and mitigating the multimodal hallucination snowballing in large vision-language models

W Zhong, X Feng, L Zhao, Q Li, L Huang, Y Gu… - arxiv preprint arxiv …, 2024 - arxiv.org
Though advanced in understanding visual information with human languages, Large Vision-
Language Models (LVLMs) still suffer from multimodal hallucinations. A natural concern is …

Searching Realistic-Looking Adversarial Objects For Autonomous Driving Systems

S Sun, S Zhu - arxiv preprint arxiv:2405.11629, 2024 - arxiv.org
Numerous studies on adversarial attacks targeting self-driving policies fail to incorporate
realistic-looking adversarial objects, limiting real-world applicability. Building upon prior …

Fine Tuning Idefic 9b With LORA for Multimodal Medical VQA

S Punneshetty, S Ashok… - 2024 International …, 2024 - ieeexplore.ieee.org
We introduce an innovative approach to Medical Visual Question Answering (VQA) by fine-
tuning the Idefic 9b model on a combined dataset from SLAKE and VQARAD. Our …