Adversarial attacks and defenses in images, graphs and text: A review

H Xu, Y Ma, HC Liu, D Deb, H Liu, JL Tang… - International journal of …, 2020 - Springer
Deep neural networks (DNN) have achieved unprecedented success in numerous machine
learning tasks in various domains. However, the existence of adversarial examples raises …

Advances in adversarial attacks and defenses in computer vision: A survey

N Akhtar, A Mian, N Kardan, M Shah - IEEE Access, 2021 - ieeexplore.ieee.org
Deep Learning is the most widely used tool in the contemporary field of computer vision. Its
ability to accurately solve complex problems is employed in vision research to learn deep …

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2024 - proceedings.neurips.cc
Large language models are now tuned to align with the goals of their creators, namely to be"
helpful and harmless." These models should respond helpfully to user questions, but refuse …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Theoretically principled trade-off between robustness and accuracy

H Zhang, Y Yu, J Jiao, E **ng… - International …, 2019 - proceedings.mlr.press
We identify a trade-off between robustness and accuracy that serves as a guiding principle
in the design of defenses against adversarial examples. Although this problem has been …

Adversarial examples are not bugs, they are features

A Ilyas, S Santurkar, D Tsipras… - Advances in neural …, 2019 - proceedings.neurips.cc
Adversarial examples have attracted significant attention in machine learning, but the
reasons for their existence and pervasiveness remain unclear. We demonstrate that …

Certified adversarial robustness via randomized smoothing

J Cohen, E Rosenfeld, Z Kolter - international conference on …, 2019 - proceedings.mlr.press
We show how to turn any classifier that classifies well under Gaussian noise into a new
classifier that is certifiably robust to adversarial perturbations under the L2 norm. While this" …

Robustbench: a standardized adversarial robustness benchmark

F Croce, M Andriushchenko, V Sehwag… - arxiv preprint arxiv …, 2020 - arxiv.org
As a research community, we are still lacking a systematic understanding of the progress on
adversarial robustness which often makes it hard to identify the most promising ideas in …

Ensemble adversarial training: Attacks and defenses

F Tramèr, A Kurakin, N Papernot, I Goodfellow… - arxiv preprint arxiv …, 2017 - arxiv.org
Adversarial examples are perturbed inputs designed to fool machine learning models.
Adversarial training injects such examples into training data to increase robustness. To …

Adversarial training for free!

A Shafahi, M Najibi, MA Ghiasi, Z Xu… - Advances in neural …, 2019 - proceedings.neurips.cc
Adversarial training, in which a network is trained on adversarial examples, is one of the few
defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high …