An adversarial perspective on machine unlearning for ai safety
Large language models are finetuned to refuse questions about hazardous knowledge, but
these protections can often be bypassed. Unlearning methods aim at completely removing …
these protections can often be bypassed. Unlearning methods aim at completely removing …
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate
In the past decade, considerable research effort has been devoted to securing machine
learning (ML) models that operate in adversarial settings. Yet, progress has been slow even …
learning (ML) models that operate in adversarial settings. Yet, progress has been slow even …
CopyrightMeter: Revisiting Copyright Protection in Text-to-image Models
Text-to-image diffusion models have emerged as powerful tools for generating high-quality
images from textual descriptions. However, their increasing popularity has raised significant …
images from textual descriptions. However, their increasing popularity has raised significant …
AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models
Recent advances in diffusion models have significantly enhanced the quality of image
synthesis, yet they have also introduced serious safety concerns, particularly the generation …
synthesis, yet they have also introduced serious safety concerns, particularly the generation …
Revisiting the Robust Alignment of Circuit Breakers
Over the past decade, adversarial training has emerged as one of the few reliable methods
for enhancing model robustness against adversarial attacks [Szegedy et al., 2014, Madry et …
for enhancing model robustness against adversarial attacks [Szegedy et al., 2014, Madry et …
[PDF][PDF] SongBsAb: A Dual Prevention Approach against Singing Voice Conversion based Illegal Song Covers
Singing voice conversion (SVC) automates song covers by converting a source singing
voice from a source singer into a new singing voice with the same lyrics and melody as the …
voice from a source singer into a new singing voice with the same lyrics and melody as the …
Certifiable AI Security against Localized Corruption Attacks
C **ang - 2025 - search.proquest.com
Building secure and robust AI models has proven to be difficult. Nearly all defenses,
including those published at top-tier venues and recognized with prestigious awards, can be …
including those published at top-tier venues and recognized with prestigious awards, can be …
AdvPaint: Protecting Images from Inpainting Manipulation via Adversarial Attention Disruption
J Jeon, WJ Kim, S Ha, S Son, S Yoon - The Thirteenth International … - openreview.net
The outstanding capability of diffusion models in generating high-quality images poses
significant threats when misused by adversaries. In particular, we assume malicious …
significant threats when misused by adversaries. In particular, we assume malicious …