Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation

S Singh, A Romanou, C Fourrier, DI Adelani… - arxiv preprint arxiv …, 2024 - arxiv.org
Cultural biases in multilingual datasets pose significant challenges for their effectiveness as
global benchmarks. These biases stem not only from language but also from the cultural …

CUNI at WMT24 general translation task: LLMs,(Q) LoRA, CPO and model merging

M Hrabal, J Jon, M Popel, N Luu… - Proceedings of the …, 2024 - aclanthology.org
This paper presents the contributions of Charles University teams to the WMT24 General
Translation task (English to Czech, German and Russian, and Czech to Ukrainian), and the …

NTTSU at WMT2024 general translation task

M Kondo, R Fukuda, X Wang, K Chousa… - Proceedings of the …, 2024 - aclanthology.org
The NTTSU team's submission leverages several large language models developed
through a training procedure that includes continual pre-training and supervised fine-tuning …

Generics are puzzling. Can language models find the missing piece?

GC Calderón, E Allaway, B Haddow, A Birch - arxiv preprint arxiv …, 2024 - arxiv.org
Generic sentences express generalisations about the world without explicit quantification.
Although generics are central to everyday communication, building a precise semantic …

Machine Translation Metrics are better in evaluating Linguistic Errors on LLMs than on Encoder-Decoder Systems

E Avramidis, S Manakhimova… - Proceedings of the …, 2024 - aclanthology.org
This year's MT metrics challenge set submission by DFKI expands on previous years'
linguistically motivated challenge sets. It includes 137,000 items extracted from 100 MT …

IOL research machine translation systems for WMT24 general machine translation shared task

W Zhang - Proceedings of the Ninth Conference on Machine …, 2024 - aclanthology.org
This paper illustrates the submission system of the IOL Research team for the WMT24
General Machine Translation shared task. We submitted translations for all translation …

Generics are puzzling. Can language models find the missing piece?

G Cilleruelo, E Allaway, B Haddow… - Proceedings of the 31st …, 2025 - aclanthology.org
Generic sentences express generalisations about the world without explicit quantification.
Although generics are central to everyday communication, building a precise semantic …

The Science of Evaluating Foundation Models

J Yuan, J Zhang, A Wen, X Hu - arxiv preprint arxiv:2502.09670, 2025 - arxiv.org
The emergent phenomena of large foundation models have revolutionized natural language
processing. However, evaluating these models presents significant challenges due to their …

Make Imagination Clearer! Stable Diffusion-based Visual Imagination for Multimodal Machine Translation

A Chen, Y Song, K Chen, M Yang, T Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Visual information has been introduced for enhancing machine translation (MT), and its
effectiveness heavily relies on the availability of large amounts of bilingual parallel sentence …

A Bayesian Optimization Approach to Machine Translation Reranking

J Cheng, M Züfle, V Zouhar, A Vlachos - arxiv preprint arxiv:2411.09694, 2024 - arxiv.org
Reranking a list of candidates from a machine translation system with an external scoring
model and returning the highest-scoring candidate remains a simple and effective method …