Rethinking machine unlearning for large language models
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …
Blind baselines beat membership inference attacks for foundation models
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …
machine learning model. For foundation models trained on unknown Web data, MI attacks …
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …
real-world applications and delivering positive social impacts, they still face significant …
Recall: Membership inference via relative conditional log-likelihoods
The rapid scaling of large language models (LLMs) has raised concerns about the
transparency and fair use of the pretraining data used for training them. Detecting such …
transparency and fair use of the pretraining data used for training them. Detecting such …
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Large Language Models (LLMs) have demonstrated impressive capabilities in generating
diverse and contextually rich text. However, concerns regarding copyright infringement arise …
diverse and contextually rich text. However, concerns regarding copyright infringement arise …
Membership inference attacks cannot prove that a model was trained on your data
We consider the problem of a training data proof, where a data creator or owner wants to
demonstrate to a third party that some machine learning model was trained on their data …
demonstrate to a third party that some machine learning model was trained on their data …
Position: Llm unlearning benchmarks are weak measures of progress
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
Include: Evaluating multilingual language understanding with regional knowledge
The performance differential of large language models (LLM) between languages hinders
their effective deployment in many regions, inhibiting the potential economic and societal …
their effective deployment in many regions, inhibiting the potential economic and societal …
Pretraining data detection for large language models: A divergence-based calibration method
As the scale of training corpora for large language models (LLMs) grows, model developers
become increasingly reluctant to disclose details on their data. This lack of transparency …
become increasingly reluctant to disclose details on their data. This lack of transparency …
Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding
The training data in large language models is key to their success, but it also presents
privacy and security risks, as it may contain sensitive information. Detecting pre-training data …
privacy and security risks, as it may contain sensitive information. Detecting pre-training data …