Detecting pretraining data from large language models

W Shi, A Ajith, M **a, Y Huang, D Liu, T Blevins… - arxiv preprint arxiv …, 2023 - arxiv.org
Although large language models (LLMs) are widely deployed, the data used to train them is
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

The data provenance initiative: A large scale audit of dataset licensing & attribution in ai

S Longpre, R Mahari, A Chen, N Obeng-Marnu… - arxiv preprint arxiv …, 2023 - arxiv.org
The race to train language models on vast, diverse, and inconsistently documented datasets
has raised pressing concerns about the legal and ethical risks for practitioners. To remedy …

The responsible foundation model development cheatsheet: A review of tools & resources

S Longpre, S Biderman, A Albalak… - arxiv preprint arxiv …, 2024 - arxiv.org
Foundation model development attracts a rapidly expanding body of contributors, scientists,
and applications. To help shape responsible development practices, we introduce the …

Consent in crisis: The rapid decline of the ai data commons

S Longpre, R Mahari, A Lee, C Lund, H Oderinwale… - NEURIPS, 2024 - hal.science
General-purpose artificial intelligence (AI) systems are built on massive swathes of public
web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge …

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org
Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

Evaluating copyright takedown methods for language models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

Do membership inference attacks work on large language models?

M Duan, A Suri, N Mireshghallah, S Min, W Shi… - arxiv preprint arxiv …, 2024 - arxiv.org
Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a
member of a target model's training data. Despite extensive research on traditional machine …

Sok: Memorization in general-purpose large language models

V Hartmann, A Suri, V Bindschaedler, D Evans… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) are advancing at a remarkable pace, with myriad
applications under development. Unlike most earlier machine learning models, they are no …