An adversarial perspective on machine unlearning for ai safety

J Łucki, B Wei, Y Huang, P Henderson… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …

Jogging the Memory of Unlearned Models Through Targeted Relearning Attacks

S Hu, Y Fu, S Wu, V Smith - … Workshop on Foundation Models in the …, 2024 - openreview.net
Machine unlearning is a promising approach to mitigate undesirable memorization of
training data in ML models. However, in this work we show that existing approaches for …

Position: Llm unlearning benchmarks are weak measures of progress

P Thaker, S Hu, N Kale, Y Maurya, ZS Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …

Towards robust knowledge unlearning: An adversarial framework for assessing and improving unlearning robustness in large language models

H Yuan, Z **, P Cao, Y Chen, K Liu, J Zhao - arxiv preprint arxiv …, 2024 - arxiv.org
LLM have achieved success in many fields but still troubled by problematic content in the
training corpora. LLM unlearning aims at reducing their influence and avoid undesirable …

CURE4Rec: A benchmark for recommendation unlearning with deeper influence

C Chen, J Zhang, Y Zhang, L Zhang, L Lyu, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
With increasing privacy concerns in artificial intelligence, regulations have mandated the
right to be forgotten, granting individuals the right to withdraw their data from models …

A Closer Look at Machine Unlearning for Large Language Models

X Yuan, T Pang, C Du, K Chen, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) may memorize sensitive or copyrighted content, raising
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …

Alternate preference optimization for unlearning factual knowledge in large language models

A Mekala, V Dorna, S Dubey, A Lalwani… - arxiv preprint arxiv …, 2024 - arxiv.org
Machine unlearning aims to efficiently eliminate the influence of specific training data,
known as the forget set, from the model. However, existing unlearning methods for Large …

To forget or not? towards practical knowledge unlearning for large language models

B Tian, X Liang, S Cheng, Q Liu, M Wang, D Sui… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive
data, such as personal privacy information and copyrighted material. Recent advancements …

Targeted Angular Reversal of Weights (TARS) for Knowledge Removal in Large Language Models

HJ Davies, G Iacovides, DP Mandic - arxiv preprint arxiv:2412.10257, 2024 - arxiv.org
The sheer scale of data required to train modern large language models (LLMs) poses
significant risks, as models are likely to gain knowledge of sensitive topics such as bio …

Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities

H Liu, L **ao, J Liu, X Li, Z Feng, S Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid advancement of Multimodal Large Language Models (MLLMs), a variety of
benchmarks have been introduced to evaluate their capabilities. While most evaluations …