Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation

S Singh, A Romanou, C Fourrier, DI Adelani… - arxiv preprint arxiv …, 2024 - arxiv.org
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as
global benchmarks. These biases stem not only from language but also from the cultural …

Include: Evaluating multilingual language understanding with regional knowledge

A Romanou, N Foroutan, A Sotnikova, Z Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
The performance differential of large language models (LLM) between languages hinders
their effective deployment in many regions, inhibiting the potential economic and societal …

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

AH Kargaran, A Modarressi, N Nikeghbal… - arxiv preprint arxiv …, 2024 - arxiv.org
English-centric large language models (LLMs) often show strong multilingual capabilities.
However, the multilingual performance of these models remains unclear and is not …

AfriInstruct: Instruction Tuning of African Languages for Diverse Tasks

K Uemura, M Chen, A Pejovic… - Findings of the …, 2024 - aclanthology.org
Large language models (LLMs) for African languages perform worse compared to their
performance in high-resource languages. To address this issue, we introduce AfriInstruct …

The Roles of English in Evaluating Multilingual Language Models

W Poelman, M de Lhoneux - arxiv preprint arxiv:2412.08392, 2024 - arxiv.org
Multilingual natural language processing is getting increased attention, with numerous
models, benchmarks, and methods being released for many languages. English is often …

IberoBench: A Benchmark for LLM Evaluation in Iberian Languages

I Baucells, J Aula-Blasco, I de-Dios-Flores… - Proceedings of the …, 2025 - aclanthology.org
The current best practice to measure the performance of base Large Language Models is to
establish a multi-task benchmark that covers a range of capabilities of interest. Currently …

Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages

E Bayes, IA Azime, JO Alabi, J Kgomo… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluations of Large Language Models (LLMs) on knowledge-intensive tasks and factual
accuracy often focus on high-resource languages primarily because datasets for low …

Automatically Generating IsiZulu Words From Indo-Arabic Numerals

Z Mahlaza, T Magwenzi, CM Keet… - Proceedings of the 17th …, 2024 - aclanthology.org
Artificial conversational agents are deployed to assist humans in a variety of tasks. Some of
these tasks require the capability to communicate numbers as part of their internal and …

Large Language Models Compression via Low-Rank Feature Distillation

Y Sy, C Cerisara, I Illina - arxiv preprint arxiv:2412.16719, 2024 - arxiv.org
Current LLM structured pruning methods involve two steps:(1) compressing with calibration
data and (2) continued pretraining on billions of tokens to recover the lost performance. This …

[PDF][PDF] Scaling Pre-training Data and Language Models for African Languages

A Oladipo - 2024 - uwspace.uwaterloo.ca
Recent advancements in language models, particularly for high-resource languages, have
not been paralleled in low-resource languages spoken across Africa. This thesis addresses …