Benchmark data contamination of large language models: A survey

C Xu, S Guan, D Greene, M Kechadi - arxiv preprint arxiv:2406.04244, 2024 - arxiv.org
The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and
Gemini has transformed the field of natural language processing. However, it has also …

How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library

M Ravaut, B Ding, F Jiao, H Chen, X Li, R Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rise of Large Language Models (LLMs) in recent years, abundant new opportunities
are emerging, but also new challenges, among which contamination is quickly becoming …

Data contamination report from the 2024 CONDA shared task

O Sainz, I García-Ferrero, A Jacovi, JA Campos… - arxiv preprint arxiv …, 2024 - arxiv.org
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of
data contamination in natural language processing, where data contamination is understood …

A Survey on Data Contamination for Large Language Models

Y Cheng, Y Chang, Y Wu - arxiv preprint arxiv:2502.14425, 2025 - arxiv.org
Recent advancements in Large Language Models (LLMs) have demonstrated significant
progress in various areas, such as text generation and code synthesis. However, the …

Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation

S Chen, Y Chen, Z Li, Y Jiang, Z Wan, Y He… - arxiv preprint arxiv …, 2025 - arxiv.org
Data contamination has received increasing attention in the era of large language models
(LLMs) due to their reliance on vast Internet-derived training corpora. To mitigate the risk of …

Does Data Contamination Detection Work (Well) for LLMs? A Survey and Evaluation on Detection Assumptions

Y Fu, O Uzuner, M Yetisgen, F **a - arxiv preprint arxiv:2410.18966, 2024 - arxiv.org
Large language models (LLMs) have demonstrated great performance across various
benchmarks, showing potential as general-purpose task solvers. However, as LLMs are …

Confounders in instance variation for the analysis of data contamination

B Mehrbakhsh, D Garigliotti… - Proceedings of the …, 2024 - aclanthology.org
Test contamination is a serious problem for the evaluation of large language models (LLMs)
because it leads to the overestimation of their performance and a quick saturation of …

[PDF][PDF] Termite Italian Text-to-SQL: A CALAMITA Challenge

F Ranaldi, ES Ruzzetti, D Onorati… - Proceedings of the 10th …, 2024 - ceur-ws.org
Relational databases play an important role in business, science, and beyond. However, the
operability of relational databases is restricted to users familiar with specific languages such …

[PDF][PDF] The limits of Italian in Reasoning Tasks

L Ranaldi, F Ranaldi, G Pucci, ES Ruzzetti… - 2024 - ceur-ws.org
Earlier works have been showing the efficacy of reasoning methods in eliciting step-wise
reasoning of large language models (LLMs) by operating via in-context demonstrations …

[PDF][PDF] How far does the sequence of compositions impact Multilingual Pre-Training?

L Ranaldi, G Pucci, FM Zanzotto - 2024 - ceur-ws.org
An Efficient strategy for conducting pre-training of language models is the concatenation of
contiguous sequences of text of fixed length through causal masking that estimates the …