Threats, attacks, and defenses in machine unlearning: A survey

Z Liu, H Ye, C Chen, Y Zheng, KY Lam - arxiv preprint arxiv:2403.13682, 2024 - arxiv.org
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …

Adversarial attacks and defenses on text-to-image diffusion models: A survey

C Zhang, M Hu, W Li, L Wang - Information Fusion, 2024 - Elsevier
Recently, the text-to-image diffusion model has gained considerable attention from the
community due to its exceptional image generation capability. A representative model …

Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arxiv preprint arxiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation

C Fan, J Liu, Y Zhang, E Wong, D Wei, S Liu - arxiv preprint arxiv …, 2023 - arxiv.org
With evolving data regulations, machine unlearning (MU) has become an important tool for
fostering trust and safety in today's AI models. However, existing MU methods focusing on …

Jailbreaking prompt attack: A controllable adversarial attack against diffusion models

J Ma, A Cao, Z **ao, Y Li, J Zhang, C Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-image (T2I) models can be maliciously used to generate harmful content such as
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …

Mma-diffusion: Multimodal attack on diffusion models

Y Yang, R Gao, X Wang, TY Ho… - Proceedings of the …, 2024 - openaccess.thecvf.com
In recent years Text-to-Image (T2I) models have seen remarkable advancements gaining
widespread adoption. However this progress has inadvertently opened avenues for …

Challenging forgets: Unveiling the worst-case forget sets in machine unlearning

C Fan, J Liu, A Hero, S Liu - European Conference on Computer Vision, 2024 - Springer
The trustworthy machine learning (ML) community is increasingly recognizing the crucial
need for models capable of selectively 'unlearning'data points after training. This leads to the …

Self-discovering interpretable diffusion latent directions for responsible text-to-image generation

H Li, C Shen, P Torr, V Tresp… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Diffusion-based models have gained significant popularity for text-to-image generation due
to their exceptional image-generation capabilities. A risk with these models is the potential …

Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers

CP Huang, KP Chang, CT Tsai, YH Lai… - … on Computer Vision, 2024 - Springer
Abstract Concept erasure in text-to-image diffusion models aims to disable pre-trained
diffusion models from generating images related to a target concept. To perform reliable …

Reliable and efficient concept erasure of text-to-image diffusion models

C Gong, K Chen, Z Wei, J Chen, YG Jiang - European Conference on …, 2024 - Springer
Text-to-image models encounter safety issues, including concerns related to copyright and
Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for …