Towards quantum enhanced adversarial robustness in machine learning

MT West, SL Tsang, JS Low, CD Hill, C Leckie… - Nature Machine …, 2023 - nature.com
Abstract Machine learning algorithms are powerful tools for data-driven tasks such as image
classification and feature detection. However, their vulnerability to adversarial examples …

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arxiv preprint arxiv …, 2023 - arxiv.org
There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

Robust fine-tuning of zero-shot models

M Wortsman, G Ilharco, JW Kim, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Adversarial robustness of neural networks from the perspective of Lipschitz calculus: A survey

MM Zühlke, D Kudenko - ACM Computing Surveys, 2024 - dl.acm.org
We survey the adversarial robustness of neural networks from the perspective of Lipschitz
calculus in a unifying fashion by expressing models, attacks and safety guarantees, that is, a …

Measuring robustness to natural distribution shifts in image classification

R Taori, A Dave, V Shankar, N Carlini… - Advances in …, 2020 - proceedings.neurips.cc
We study how robust current ImageNet models are to distribution shifts arising from natural
variations in datasets. Most research on robustness focuses on synthetic image …

Do adversarially robust imagenet models transfer better?

H Salman, A Ilyas, L Engstrom… - Advances in Neural …, 2020 - proceedings.neurips.cc
Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on
standard datasets can be efficiently adapted to downstream tasks. Typically, better pre …

Overfitting in adversarially robust deep learning

L Rice, E Wong, Z Kolter - International conference on …, 2020 - proceedings.mlr.press
It is common practice in deep learning to use overparameterized networks and train for as
long as possible; there are numerous studies that show, both theoretically and empirically …

Uncovering the limits of adversarial training against norm-bounded adversarial examples

S Gowal, C Qin, J Uesato, T Mann, P Kohli - arxiv preprint arxiv …, 2020 - arxiv.org
Adversarial training and its variants have become de facto standards for learning robust
deep neural networks. In this paper, we explore the landscape around adversarial training in …