- Academic Search

W Shi, A Ajith, M **a, Y Huang, D Liu, T Blevins… - arxiv preprint arxiv …, 2023 - arxiv.org

Although large language models (LLMs) are widely deployed, the data used to train them is
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …

Save Cite Cited by 216 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

Save Cite Cited by 28 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

The data provenance initiative: A large scale audit of dataset licensing & attribution in ai

S Longpre, R Mahari, A Chen, N Obeng-Marnu… - arxiv preprint arxiv …, 2023 - arxiv.org

The race to train language models on vast, diverse, and inconsistently documented datasets
has raised pressing concerns about the legal and ethical risks for practitioners. To remedy …

Save Cite Cited by 51 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

The responsible foundation model development cheatsheet: A review of tools & resources

S Longpre, S Biderman, A Albalak… - arxiv preprint arxiv …, 2024 - arxiv.org

Foundation model development attracts a rapidly expanding body of contributors, scientists,
and applications. To help shape responsible development practices, we introduce the …

Save Cite Cited by 5 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] hal.science

Consent in crisis: The rapid decline of the ai data commons

S Longpre, R Mahari, A Lee, C Lund, H Oderinwale… - NEURIPS, 2024 - hal.science

General-purpose artificial intelligence (AI) systems are built on massive swathes of public
web data, assembled into corpora such as C4, RefinedWeb, and Dolma. To our knowledge …

Save Cite Cited by 24 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jair.org

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

Save Cite Cited by 12 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arxiv preprint arxiv …, 2024 - arxiv.org

This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

Save Cite Cited by 12 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Evaluating copyright takedown methods for language models

B Wei, W Shi, Y Huang, NA Smith, C Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models (LMs) derive their capabilities from extensive training on diverse data,
including potentially copyrighted material. These models can memorize and generate …

Save Cite Cited by 11 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Do membership inference attacks work on large language models?

M Duan, A Suri, N Mireshghallah, S Min, W Shi… - arxiv preprint arxiv …, 2024 - arxiv.org

Membership inference attacks (MIAs) attempt to predict whether a particular datapoint is a
member of a target model's training data. Despite extensive research on traditional machine …

Save Cite Cited by 41 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Sok: Memorization in general-purpose large language models

V Hartmann, A Suri, V Bindschaedler, D Evans… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) are advancing at a remarkable pace, with myriad
applications under development. Unlike most earlier machine learning models, they are no …

Save Cite Cited by 37 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Silo language models: Isolating legal risk in a nonparametric datastore

Detecting pretraining data from large language models

Muse: Machine unlearning six-way evaluation for language models

The data provenance initiative: A large scale audit of dataset licensing & attribution in ai

The responsible foundation model development cheatsheet: A review of tools & resources

Consent in crisis: The rapid decline of the ai data commons

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

International Scientific Report on the Safety of Advanced AI (Interim Report)

Evaluating copyright takedown methods for language models

Do membership inference attacks work on large language models?

Sok: Memorization in general-purpose large language models