Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

On protecting the data privacy of large language models (llms): A survey

B Yan, K Li, M Xu, Y Dong, Y Zhang, Z Ren… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large language models (LLMs) are complex artificial intelligence systems capable of
understanding, generating and translating human language. They learn language patterns …

Preserving privacy in large language models: A survey on current threats and solutions

M Miranda, ES Ruzzetti, A Santilli, FM Zanzotto… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large Language Models (LLMs) represent a significant advancement in artificial
intelligence, finding applications across various domains. However, their reliance on …

Blind baselines beat membership inference attacks for foundation models

D Das, J Zhang, F Tramèr - arxiv preprint arxiv:2406.16201, 2024‏ - arxiv.org
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …

An archival perspective on pretraining data

MA Desai, IV Pasquetto, AZ Jacobs, D Card - Patterns, 2024‏ - cell.com
Alongside an explosion in research and development related to large language models,
there has been a concomitant rise in the creation of pretraining datasets—massive …

Min-k%++: Improved baseline for detecting pre-training data from large language models

J Zhang, J Sun, E Yeats, Y Ouyang, M Kuo… - arxiv preprint arxiv …, 2024‏ - arxiv.org
The problem of pre-training data detection for large language models (LLMs) has received
growing attention due to its implications in critical issues like copyright violation and test data …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arxiv preprint arxiv …, 2024‏ - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Exploring {ChatGPT's} Capabilities on Vulnerability Management

P Liu, J Liu, L Fu, K Lu, Y **a, X Zhang… - 33rd USENIX Security …, 2024‏ - usenix.org
Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works
show that ChatGPT has the capabilities of processing foundational code analysis tasks …

Aligning llms to be robust against prompt injection

S Chen, A Zharmagambetov, S Mahloujifar… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large language models (LLMs) are becoming increasingly prevalent in modern software
systems, interfacing between the user and the internet to assist with tasks that require …

Open problems in machine unlearning for ai safety

F Barez, T Fu, A Prabhu, S Casper, A Sanyal… - arxiv preprint arxiv …, 2025‏ - arxiv.org
As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …