Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

To generate or not? safety-driven unlearned diffusion models are still easy to generate unsafe images... for now

Y Zhang, J Jia, X Chen, A Chen, Y Zhang, J Liu… - … on Computer Vision, 2024 - Springer
The recent advances in diffusion models (DMs) have revolutionized the generation of
realistic and complex images. However, these models also introduce potential safety …

Practical unlearning for large language models

C Gao, L Wang, C Weng, X Wang, Q Zhu - arxiv preprint arxiv:2407.10223, 2024 - arxiv.org
While LLMs have demonstrated impressive performance across various domains and tasks,
their security issues have become increasingly severe. Machine unlearning (MU) has …

Jogging the Memory of Unlearned Models Through Targeted Relearning Attacks

S Hu, Y Fu, S Wu, V Smith - … Workshop on Foundation Models in the …, 2024 - openreview.net
Machine unlearning is a promising approach to mitigate undesirable memorization of
training data in ML models. However, in this work we show that existing approaches for …

Meta-unlearning on diffusion models: Preventing relearning unlearned concepts

H Gao, T Pang, C Du, T Hu, Z Deng, M Lin - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid progress of diffusion-based content generation, significant efforts are being
made to unlearn harmful or copyrighted concepts from pretrained diffusion models (DMs) to …

On effects of steering latent representation for large language model unlearning

D Huu-Tien, TT Pham, H Thanh-Tung… - arxiv preprint arxiv …, 2024 - arxiv.org
Representation Misdirection for Unlearning (RMU), which steers model representation in the
intermediate layer to a target random representation, is an effective method for large …

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

AF Cooper, CA Choquette-Choo, M Bogen… - arxiv preprint arxiv …, 2024 - arxiv.org
We articulate fundamental mismatches between technical methods for machine unlearning
in Generative AI, and documented aspirations for broader impact that these methods could …

Alternate preference optimization for unlearning factual knowledge in large language models

A Mekala, V Dorna, S Dubey, A Lalwani… - arxiv preprint arxiv …, 2024 - arxiv.org
Machine unlearning aims to efficiently eliminate the influence of specific training data,
known as the forget set, from the model. However, existing unlearning methods for Large …

A Closer Look at Machine Unlearning for Large Language Models

X Yuan, T Pang, C Du, K Chen, W Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) may memorize sensitive or copyrighted content, raising
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …

Open Problems in Machine Unlearning for AI Safety

F Barez, T Fu, A Prabhu, S Casper, A Sanyal… - arxiv preprint arxiv …, 2025 - arxiv.org
As AI systems become more capable, widely deployed, and increasingly autonomous in
critical areas such as cybersecurity, biological research, and healthcare, ensuring their …