Counterfactual explanations and how to find them: literature review and benchmarking

R Guidotti - Data Mining and Knowledge Discovery, 2024 - Springer
Interpretable machine learning aims at unveiling the reasons behind predictions returned by
uninterpretable classifiers. One of the most valuable types of explanation consists of …

Adversarial machine learning for network intrusion detection systems: A comprehensive survey

K He, DD Kim, MR Asghar - IEEE Communications Surveys & …, 2023 - ieeexplore.ieee.org
Network-based Intrusion Detection System (NIDS) forms the frontline defence against
network attacks that compromise the security of the data, systems, and networks. In recent …

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arxiv preprint arxiv …, 2023 - arxiv.org
Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

Glaze: Protecting artists from style mimicry by {Text-to-Image} models

S Shan, J Cryan, E Wenger, H Zheng… - 32nd USENIX Security …, 2023 - usenix.org
Recent text-to-image diffusion models such as MidJourney and Stable Diffusion threaten to
displace many in the professional artist community. In particular, models can learn to mimic …

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

Visual adversarial examples jailbreak aligned large language models

X Qi, K Huang, A Panda, P Henderson… - Proceedings of the …, 2024 - ojs.aaai.org
Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …

Improving robustness using generated data

S Gowal, SA Rebuffi, O Wiles… - Advances in …, 2021 - proceedings.neurips.cc
Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …

Data augmentation can improve robustness

SA Rebuffi, S Gowal, DA Calian… - Advances in …, 2021 - proceedings.neurips.cc
Adversarial training suffers from robust overfitting, a phenomenon where the robust test
accuracy starts to decrease during training. In this paper, we focus on reducing robust …

Lira: Learnable, imperceptible and robust backdoor attacks

K Doan, Y Lao, W Zhao, P Li - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Recently, machine learning models have demonstrated to be vulnerable to backdoor
attacks, primarily due to the lack of transparency in black-box models such as deep neural …

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

F Croce, M Hein - International conference on machine …, 2020 - proceedings.mlr.press
The field of defense strategies against adversarial attacks has significantly grown over the
last years, but progress is hampered as the evaluation of adversarial defenses is often …