An adversarial perspective on machine unlearning for ai safety
Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …
these protections can often be bypassed. Unlearning methods aim at completely removing …
Jogging the Memory of Unlearned Models Through Targeted Relearning Attacks
Machine unlearning is a promising approach to mitigate undesirable memorization of
training data in ML models. However, in this work we show that existing approaches for …
training data in ML models. However, in this work we show that existing approaches for …
Position: Llm unlearning benchmarks are weak measures of progress
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
Towards robust knowledge unlearning: An adversarial framework for assessing and improving unlearning robustness in large language models
LLM have achieved success in many fields but still troubled by problematic content in the
training corpora. LLM unlearning aims at reducing their influence and avoid undesirable …
training corpora. LLM unlearning aims at reducing their influence and avoid undesirable …
CURE4Rec: A benchmark for recommendation unlearning with deeper influence
With increasing privacy concerns in artificial intelligence, regulations have mandated the
right to be forgotten, granting individuals the right to withdraw their data from models …
right to be forgotten, granting individuals the right to withdraw their data from models …
A Closer Look at Machine Unlearning for Large Language Models
Large language models (LLMs) may memorize sensitive or copyrighted content, raising
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …
Alternate preference optimization for unlearning factual knowledge in large language models
Machine unlearning aims to efficiently eliminate the influence of specific training data,
known as the forget set, from the model. However, existing unlearning methods for Large …
known as the forget set, from the model. However, existing unlearning methods for Large …
To forget or not? towards practical knowledge unlearning for large language models
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive
data, such as personal privacy information and copyrighted material. Recent advancements …
data, such as personal privacy information and copyrighted material. Recent advancements …
Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models
The sheer scale of data required to train modern large language models (LLMs) poses
significant risks, as models are likely to gain knowledge of sensitive topics such as bio …
significant risks, as models are likely to gain knowledge of sensitive topics such as bio …
Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities
With the rapid advancement of Multimodal Large Language Models (MLLMs), a variety of
benchmarks have been introduced to evaluate their capabilities. While most evaluations …
benchmarks have been introduced to evaluate their capabilities. While most evaluations …