Benchmark evaluations, applications, and challenges of large vision language models: A survey

Z Li, X Wu, H Du, H Nghiem, G Shi - arxiv preprint arxiv:2501.02189, 2025 - arxiv.org
Multimodal Vision Language Models (VLMs) have emerged as a transformative technology
at the intersection of computer vision and natural language processing, enabling machines …

All languages matter: Evaluating lmms on culturally diverse 100 languages

A Vayani, D Dissanayake, H Watawana… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing Large Multimodal Models (LMMs) generally focus on only a few regions and
languages. As LMMs continue to improve, it is increasingly important to ensure they …

Survey of cultural awareness in language models: Text and beyond

S Pawar, J Park, J **, A Arora, J Myung… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale deployment of large language models (LLMs) in various applications, such as
chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure …

From local concepts to universals: Evaluating the multicultural understanding of vision-language models

M Bhatia, S Ravi, A Chinchure, E Hwang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite recent advancements in vision-language models, their performance remains
suboptimal on images from non-western cultures due to underrepresentation in training …

Worldcuisines: A massive-scale benchmark for multilingual and multicultural visual question answering on global cuisines

GI Winata, F Hudi, PA Irawan, D Anugraha… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly
in languages other than English and in underrepresented cultural contexts. To evaluate their …

GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks

MS Danish, MA Munir, SRA Shah, K Kuckreja… - arxiv preprint arxiv …, 2024 - arxiv.org
While numerous recent benchmarks focus on evaluating generic Vision-Language Models
(VLMs), they fall short in addressing the unique demands of geospatial applications. Generic …

Cultural Adaptation of Menus: A Fine-Grained Approach

Z Zhang, X He, V Iyer, A Birch - arxiv preprint arxiv:2408.13534, 2024 - arxiv.org
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent
work on CSI translation has shown some success using Large Language Models (LLMs) to …

Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models

MD Bui, K von der Wense, A Lauscher - arxiv preprint arxiv:2411.03888, 2024 - arxiv.org
Warning: this paper contains content that may be offensive or upsetting Hate speech
moderation on global platforms poses unique challenges due to the multimodal and …

LLM-GLOBE: A Benchmark Evaluating the Cultural Values Embedded in LLM Output

E Karinshak, A Hu, K Kong, V Rao, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Immense effort has been dedicated to minimizing the presence of harmful or biased
generative content and better aligning AI output to human intention; however, research …

CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

S Liu, Y **, C Li, DF Wong, Q Wen, L Sun… - arxiv preprint arxiv …, 2025 - arxiv.org
Vision-language models (VLMs) have advanced human-AI interaction but struggle with
cultural understanding, often misinterpreting symbols, gestures, and artifacts due to biases …