Threats, attacks, and defenses in machine unlearning: A survey
Machine Unlearning (MU) has recently gained considerable attention due to its potential to
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
achieve Safe AI by removing the influence of specific data from trained Machine Learning …
Adversarial attacks and defenses on text-to-image diffusion models: A survey
C Zhang, M Hu, W Li, L Wang - Information Fusion, 2024 - Elsevier
Recently, the text-to-image diffusion model has gained considerable attention from the
community due to its exceptional image generation capability. A representative model …
community due to its exceptional image generation capability. A representative model …
Rethinking machine unlearning for large language models
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …
Salun: Empowering machine unlearning via gradient-based weight saliency in both image classification and generation
With evolving data regulations, machine unlearning (MU) has become an important tool for
fostering trust and safety in today's AI models. However, existing MU methods focusing on …
fostering trust and safety in today's AI models. However, existing MU methods focusing on …
Jailbreaking prompt attack: A controllable adversarial attack against diffusion models
Text-to-image (T2I) models can be maliciously used to generate harmful content such as
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …
Mma-diffusion: Multimodal attack on diffusion models
In recent years Text-to-Image (T2I) models have seen remarkable advancements gaining
widespread adoption. However this progress has inadvertently opened avenues for …
widespread adoption. However this progress has inadvertently opened avenues for …
Challenging forgets: Unveiling the worst-case forget sets in machine unlearning
The trustworthy machine learning (ML) community is increasingly recognizing the crucial
need for models capable of selectively 'unlearning'data points after training. This leads to the …
need for models capable of selectively 'unlearning'data points after training. This leads to the …
Self-discovering interpretable diffusion latent directions for responsible text-to-image generation
Diffusion-based models have gained significant popularity for text-to-image generation due
to their exceptional image-generation capabilities. A risk with these models is the potential …
to their exceptional image-generation capabilities. A risk with these models is the potential …
Receler: Reliable concept erasing of text-to-image diffusion models via lightweight erasers
Abstract Concept erasure in text-to-image diffusion models aims to disable pre-trained
diffusion models from generating images related to a target concept. To perform reliable …
diffusion models from generating images related to a target concept. To perform reliable …
Reliable and efficient concept erasure of text-to-image diffusion models
Text-to-image models encounter safety issues, including concerns related to copyright and
Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for …
Not-Safe-For-Work (NSFW) content. Despite several methods have been proposed for …