Towards quantum enhanced adversarial robustness in machine learning
Abstract Machine learning algorithms are powerful tools for data-driven tasks such as image
classification and feature detection. However, their vulnerability to adversarial examples …
classification and feature detection. However, their vulnerability to adversarial examples …
Jailbreaking black box large language models in twenty queries
There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …
Certifying llm safety against adversarial prompting
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …
Robust fine-tuning of zero-shot models
Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …
Smoothllm: Defending large language models against jailbreaking attacks
Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …
Adversarial robustness of neural networks from the perspective of Lipschitz calculus: A survey
We survey the adversarial robustness of neural networks from the perspective of Lipschitz
calculus in a unifying fashion by expressing models, attacks and safety guarantees, that is, a …
calculus in a unifying fashion by expressing models, attacks and safety guarantees, that is, a …
Measuring robustness to natural distribution shifts in image classification
We study how robust current ImageNet models are to distribution shifts arising from natural
variations in datasets. Most research on robustness focuses on synthetic image …
variations in datasets. Most research on robustness focuses on synthetic image …
Do adversarially robust imagenet models transfer better?
Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on
standard datasets can be efficiently adapted to downstream tasks. Typically, better pre …
standard datasets can be efficiently adapted to downstream tasks. Typically, better pre …
Overfitting in adversarially robust deep learning
It is common practice in deep learning to use overparameterized networks and train for as
long as possible; there are numerous studies that show, both theoretically and empirically …
long as possible; there are numerous studies that show, both theoretically and empirically …
Uncovering the limits of adversarial training against norm-bounded adversarial examples
Adversarial training and its variants have become de facto standards for learning robust
deep neural networks. In this paper, we explore the landscape around adversarial training in …
deep neural networks. In this paper, we explore the landscape around adversarial training in …