Better diffusion models further improve adversarial training

Z Wang, T Pang, C Du, M Lin… - … on Machine Learning, 2023 - proceedings.mlr.press
It has been recognized that the data generated by the denoising diffusion probabilistic
model (DDPM) improves adversarial training. After two years of rapid development in …

Diffusion models for adversarial purification

W Nie, B Guo, Y Huang, C **ao, A Vahdat… - arxiv preprint arxiv …, 2022 - arxiv.org
Adversarial purification refers to a class of defense methods that remove adversarial
perturbations using a generative model. These methods do not make assumptions on the …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Understanding robust overfitting of adversarial training and beyond

C Yu, B Han, L Shen, J Yu, C Gong… - International …, 2022 - proceedings.mlr.press
Robust overfitting widely exists in adversarial training of deep networks. The exact
underlying reasons for this are still not completely understood. Here, we explore the causes …

Robust evaluation of diffusion-based adversarial purification

M Lee, D Kim - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
We question the current evaluation practice on diffusion-based purification methods.
Diffusion-based purification methods aim to remove adversarial effects from an input data …

On the robustness of open-world test-time training: Self-training with dynamic prototype expansion

Y Li, X Xu, Y Su, K Jia - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Generalizing deep learning models to unknown target domain distribution with low latency
has motivated research into test-time training/adaptation (TTT/TTA). Existing approaches …

DISCO: Adversarial defense with local implicit functions

CH Ho, N Vasconcelos - Advances in Neural Information …, 2022 - proceedings.neurips.cc
The problem of adversarial defenses for image classification, where the goal is to robustify a
classifier against adversarial examples, is considered. Inspired by the hypothesis that these …

SoK: Explainable machine learning in adversarial environments

M Noppel, C Wressnegger - 2024 IEEE Symposium on Security …, 2024 - ieeexplore.ieee.org
Modern deep learning methods have long been considered black boxes due to the lack of
insights into their decision-making process. However, recent advances in explainable …

Visual prompting for adversarial robustness

A Chen, P Lorenz, Y Yao, PY Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this work, we leverage visual prompting (VP) to improve adversarial robustness of a fixed,
pre-trained model at test time. Compared to conventional adversarial defenses, VP allows …

Threat model-agnostic adversarial defense using diffusion models

T Blau, R Ganz, B Kawar, A Bronstein… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations,
known as adversarial attacks. Following the discovery of this vulnerability in real-world …