Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Blind baselines beat membership inference attacks for foundation models

D Das, J Zhang, F Tramèr - arxiv preprint arxiv:2406.16201, 2024 - arxiv.org
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arxiv preprint arxiv …, 2025 - arxiv.org
While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …

Recall: Membership inference via relative conditional log-likelihoods

R **e, J Wang, R Huang, M Zhang, R Ge, J Pei… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid scaling of large language models (LLMs) has raised concerns about the
transparency and fair use of the pretraining data used for training them. Detecting such …

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

MA Panaitescu-Liess, Z Che, B An, Y Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive capabilities in generating
diverse and contextually rich text. However, concerns regarding copyright infringement arise …

Membership inference attacks cannot prove that a model was trained on your data

J Zhang, D Das, G Kamath, F Tramèr - arxiv preprint arxiv:2409.19798, 2024 - arxiv.org
We consider the problem of a training data proof, where a data creator or owner wants to
demonstrate to a third party that some machine learning model was trained on their data …

Position: Llm unlearning benchmarks are weak measures of progress

P Thaker, S Hu, N Kale, Y Maurya, ZS Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …

Include: Evaluating multilingual language understanding with regional knowledge

A Romanou, N Foroutan, A Sotnikova, Z Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
The performance differential of large language models (LLM) between languages hinders
their effective deployment in many regions, inhibiting the potential economic and societal …

Pretraining data detection for large language models: A divergence-based calibration method

W Zhang, R Zhang, J Guo, M de Rijke, Y Fan… - arxiv preprint arxiv …, 2024 - arxiv.org
As the scale of training corpora for large language models (LLMs) grows, model developers
become increasingly reluctant to disclose details on their data. This lack of transparency …

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

C Wang, Y Wang, B Hooi, Y Cai, N Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
The training data in large language models is key to their success, but it also presents
privacy and security risks, as it may contain sensitive information. Detecting pre-training data …