Math-llava: Bootstrap** mathematical reasoning for multimodal large language models

W Shi, Z Hu, Y Bin, J Liu, Y Yang, SK Ng, L Bing… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated impressive reasoning capabilities,
particularly in textual mathematical problem-solving. However, existing open-source image …

Recent advances in hate speech moderation: Multimodality and the role of large models

MS Hee, S Sharma, R Cao, P Nandi… - arxiv preprint arxiv …, 2024 - arxiv.org
In the evolving landscape of online communication, moderating hate speech (HS) presents
an intricate challenge, compounded by the multimodal nature of digital content. This …

Towards explainable harmful meme detection through multimodal debate between large language models

H Lin, Z Luo, W Gao, J Ma, B Wang… - Proceedings of the ACM …, 2024 - dl.acm.org
The age of social media is flooded with Internet memes, necessitating a clear grasp and
effective identification of harmful ones. This task presents a significant challenge due to the …

Goat-bench: Safety insights to large multimodal models through meme-based social abuse

H Lin, Z Luo, B Wang, R Yang, J Ma - arxiv preprint arxiv:2401.01523, 2024 - arxiv.org
The exponential growth of social media has profoundly transformed how information is
created, disseminated, and absorbed, exceeding any precedent in the digital age …

CofiPara: A coarse-to-fine paradigm for multimodal sarcasm target identification with large multimodal models

Z Chen, H Lin, Z Luo, M Cheng, J Ma… - Proceedings of the 62nd …, 2024 - aclanthology.org
Social media abounds with multimodal sarcasm, and identifying sarcasm targets is
particularly challenging due to the implicit incongruity not directly evident in the text and …

CofiPara: A coarse-to-fine paradigm for multimodal sarcasm target identification with large multimodal models

H Lin, Z Chen, Z Luo, M Cheng, J Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Social media abounds with multimodal sarcasm, and identifying sarcasm targets is
particularly challenging due to the implicit incongruity not directly evident in the text and …

Describe-then-reason: Improving multimodal mathematical reasoning through visual comprehension training

M Jia, Z Zhang, W Yu, F Jiao, M Jiang - arxiv preprint arxiv:2404.14604, 2024 - arxiv.org
Open-source multimodal large language models (MLLMs) excel in various tasks involving
textual and visual inputs but still struggle with complex multimodal mathematical reasoning …

Mfc-bench: Benchmarking multimodal fact-checking with large vision-language models

S Wang, H Lin, Z Luo, Z Ye, G Chen, J Ma - arxiv preprint arxiv …, 2024 - arxiv.org
Large vision-language models (LVLMs) have significantly improved multimodal reasoning
tasks, such as visual question answering and image captioning. These models embed …

Fuser: an enhanced multimodal fusion framework with congruent reinforced perceptron for hateful memes detection

F Wu, B Gao, X Pan, L Li, Y Ma, S Liu, Z Liu - Information Processing & …, 2024 - Elsevier
As a multimodal form of hate speech on social media, hateful memes are more aggressive
and cryptic threats to the real life of humans. Automatic detection of hateful memes is crucial …

Boosting Scientific Concepts Understanding: Can Analogy from Teacher Models Empower Student Models?

S Yuan, C Jiayang, L Qiu, D Yang - arxiv preprint arxiv:2406.11375, 2024 - arxiv.org
Analogical reasoning plays a critical role in human cognition, enabling us to understand
new concepts by associating them with familiar ones. Previous research in the AI community …