- Academic Search

MT West, SL Tsang, JS Low, CD Hill, C Leckie… - Nature Machine …, 2023 - nature.com

Abstract Machine learning algorithms are powerful tools for data-driven tasks such as image
classification and feature detection. However, their vulnerability to adversarial examples …

Speichern Zitieren Zitiert von: 45 Ähnliche Artikel Alle 5 Versionen

[Free GPT-4]

[PDF] arxiv.org

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arxiv preprint arxiv …, 2023 - arxiv.org

There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

Speichern Zitieren Zitiert von: 439 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

Speichern Zitieren Zitiert von: 139 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Robust fine-tuning of zero-shot models

M Wortsman, G Ilharco, JW Kim, M Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

Large pre-trained models such as CLIP or ALIGN offer consistent accuracy across a range of
data distributions when performing zero-shot inference (ie, without fine-tuning on a specific …

Speichern Zitieren Zitiert von: 683 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Smoothllm: Defending large language models against jailbreaking attacks

A Robey, E Wong, H Hassani, GJ Pappas - arxiv preprint arxiv …, 2023 - arxiv.org

Despite efforts to align large language models (LLMs) with human values, widely-used
LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks …

Speichern Zitieren Zitiert von: 214 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] acm.org

Adversarial robustness of neural networks from the perspective of Lipschitz calculus: A survey

MM Zühlke, D Kudenko - ACM Computing Surveys, 2024 - dl.acm.org

We survey the adversarial robustness of neural networks from the perspective of Lipschitz
calculus in a unifying fashion by expressing models, attacks and safety guarantees, that is, a …

Speichern Zitieren Zitiert von: 7 Ähnliche Artikel

[Free GPT-4]

[PDF] neurips.cc

Measuring robustness to natural distribution shifts in image classification

R Taori, A Dave, V Shankar, N Carlini… - Advances in …, 2020 - proceedings.neurips.cc

We study how robust current ImageNet models are to distribution shifts arising from natural
variations in datasets. Most research on robustness focuses on synthetic image …

Speichern Zitieren Zitiert von: 618 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Do adversarially robust imagenet models transfer better?

H Salman, A Ilyas, L Engstrom… - Advances in Neural …, 2020 - proceedings.neurips.cc

Transfer learning is a widely-used paradigm in deep learning, where models pre-trained on
standard datasets can be efficiently adapted to downstream tasks. Typically, better pre …

Speichern Zitieren Zitiert von: 485 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Overfitting in adversarially robust deep learning

L Rice, E Wong, Z Kolter - International conference on …, 2020 - proceedings.mlr.press

It is common practice in deep learning to use overparameterized networks and train for as
long as possible; there are numerous studies that show, both theoretically and empirically …

Speichern Zitieren Zitiert von: 992 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Uncovering the limits of adversarial training against norm-bounded adversarial examples

S Gowal, C Qin, J Uesato, T Mann, P Kohli - arxiv preprint arxiv …, 2020 - arxiv.org

Adversarial training and its variants have become de facto standards for learning robust
deep neural networks. In this paper, we explore the landscape around adversarial training in …

Speichern Zitieren Zitiert von: 360 Ähnliche Artikel Alle 2 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Provably robust deep learning via adversarially trained smoothed classifiers

Towards quantum enhanced adversarial robustness in machine learning

Jailbreaking black box large language models in twenty queries

Certifying llm safety against adversarial prompting

Robust fine-tuning of zero-shot models

Smoothllm: Defending large language models against jailbreaking attacks

Adversarial robustness of neural networks from the perspective of Lipschitz calculus: A survey

Measuring robustness to natural distribution shifts in image classification

Do adversarially robust imagenet models transfer better?

Overfitting in adversarially robust deep learning

Uncovering the limits of adversarial training against norm-bounded adversarial examples