BenCzechMark: A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

M Fajcik, M Docekal, J Dolezal, K Ondrej… - arxiv preprint arxiv …, 2024 - arxiv.org
We present BenCzechMark (BCM), the first comprehensive Czech language benchmark
designed for large language models, offering diverse tasks, multiple task formats, and …