BenCzechMark: A Czech-centric Multitask and Multimetric Benchmark for Large Language Models with Duel Scoring Mechanism

M Fajcik, M Docekal, J Dolezal, K Ondrej… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We present BenCzechMark (BCM), the first comprehensive Czech language benchmark
designed for large language models, offering diverse tasks, multiple task formats, and …

We're Calling an Intervention: Exploring the Fundamental Hurdles in Adapting Language Models to Nonstandard Text

A Srivastava, D Chiang - arxiv preprint arxiv:2404.07304, 2024‏ - arxiv.org
We present a suite of experiments that allow us to understand the underlying challenges of
language model adaptation to nonstandard text. We do so by designing interventions that …