Bridging the Data Provenance Gap Across Text, Speech and Video

S Longpre, N Singh, M Cherep, K Tiwary… - arxiv preprint arxiv …, 2024 - arxiv.org
Progress in AI is driven largely by the scale and quality of training data. Despite this, there is
a deficit of empirical analysis examining the attributes of well-established datasets beyond …

BenCzechMark: A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

M Fajcik, M Docekal, J Dolezal, K Ondrej… - arxiv preprint arxiv …, 2024 - arxiv.org
We present BenCzechMark (BCM), the first comprehensive Czech language benchmark
designed for large language models, offering diverse tasks, multiple task formats, and …

Bridging the Gap: Enhancing LLM Performance for Low-Resource African Languages with New Benchmarks, Fine-Tuning, and Cultural Adjustments

T Alhanai, A Kasumovic, M Ghassemi… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have shown remarkable performance across various tasks,
yet significant disparities remain for non-English languages, and especially native African …