Math-llava: Bootstrap** mathematical reasoning for multimodal large language models

W Shi, Z Hu, Y Bin, J Liu, Y Yang, SK Ng, L Bing… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated impressive reasoning capabilities,
particularly in textual mathematical problem-solving. However, existing open-source image …

Lmod: A large multimodal ophthalmology dataset and benchmark for large vision-language models

Z Qin, Y Yin, D Campbell, X Wu, K Zou… - arxiv preprint arxiv …, 2024 - arxiv.org
The prevalence of vision-threatening eye diseases is a significant global burden, with many
cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision …

The vllm safety paradox: Dual ease in jailbreak attack and defense

Y Guo, F Jiao, L Nie, M Kankanhalli - arxiv preprint arxiv:2411.08410, 2024 - arxiv.org
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as
no surprise. However, recent defense mechanisms against these attacks have reached near …

EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios

L Qiu, Y Ge, Y Chen, Y Ge, Y Shan, X Liu - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of Multimodal Large Language Models, leveraging the power of Large
Language Models, has recently demonstrated superior multimodal understanding and …

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

A Miyai, J Yang, J Zhang, Y Ming, Q Yu, G Irie… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces a novel and significant challenge for Vision Language Models
(VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to …

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

M Jia, Z Zhang, W Yu, F Jiao, M Jiang - arxiv preprint arxiv:2404.14604, 2024 - arxiv.org
Open-source multimodal large language models (MLLMs) excel in various tasks involving
textual and visual inputs but still struggle with complex multimodal mathematical reasoning …

CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering

B Vardi, O Nir, A Shamir - arxiv preprint arxiv:2501.01371, 2025 - arxiv.org
Recent Vision-Language Models (VLMs) have demonstrated remarkable capabilities in
visual understanding and reasoning, and in particular on multiple-choice Visual Question …

When Not to Answer: Evaluating Prompts on GPT Models for Effective Abstention in Unanswerable Math Word Problems

A Saadat, TB Sogir, MTA Chowdhury, S Aziz - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are increasingly relied upon to solve complex mathematical
word problems. However, being susceptible to hallucination, they may generate inaccurate …

Unsolvable Problem Detection for Vision Language Models

A Miyai, J Yang, J Zhang, Y Ming, Q Yu, G Irie… - ICLR 2024 Workshop on … - openreview.net
This paper introduces a novel and significant challenge for Vision Language Models
(VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to …

Unsolvable Problem Detection: Evaluating Trustworthiness of Large Multimodal Models

S Detection - openreview.net
This paper introduces a novel and well-defined challenge for Large Multimodal Models
(LMMs), termed Unsolvable Problem Detection (UPD). UPD examines the LMM's ability to …