DARPA's explainable AI (XAI) program: A retrospective

D Gunning, E Vorm, Y Wang, M Turek - Authorea Preprints, 2021 - techrxiv.org
DARPA formulated the Explainable Artificial Intelligence (XAI) program in 2015 with the goal
to enable end users to better understand, trust, and effectively manage artificially intelligent …

Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses

M Goldblum, D Tsipras, C **, L Fowl, G Somepalli, M Goldblum… - arxiv preprint arxiv …, 2021 - arxiv.org
Data poisoning is a threat model in which a malicious actor tampers with training data to
manipulate outcomes at inference time. A variety of defenses against this threat model have …

Handcrafted backdoors in deep neural networks

S Hong, N Carlini, A Kurakin - Advances in Neural …, 2022 - proceedings.neurips.cc
When machine learning training is outsourced to third parties, $ backdoor $$ attacks $
become practical as the third party who trains the model may act maliciously to inject hidden …

Just rotate it: Deploying backdoor attacks via rotation transformation

T Wu, T Wang, V Sehwag, S Mahloujifar… - Proceedings of the 15th …, 2022 - dl.acm.org
Recent works have demonstrated that deep learning models are vulnerable to backdoor
poisoning attacks, where these attacks instill spurious correlations to external trigger …

Quarantine: Sparsity can uncover the trojan attack trigger for free

T Chen, Z Zhang, Y Zhang, S Chang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Trojan attacks threaten deep neural networks (DNNs) by poisoning them to behave normally
on most samples, yet to produce manipulated results for inputs attached with a particular …

Distilling cognitive backdoor patterns within an image

H Huang, X Ma, S Erfani, J Bailey - arxiv preprint arxiv:2301.10908, 2023 - arxiv.org
This paper proposes a simple method to distill and detect backdoor patterns within an
image:\emph {Cognitive Distillation}(CD). The idea is to extract the" minimal essence" from …

Accumulative poisoning attacks on real-time data

T Pang, X Yang, Y Dong, H Su… - Advances in Neural …, 2021 - proceedings.neurips.cc
Collecting training data from untrusted sources exposes machine learning services to
poisoning adversaries, who maliciously manipulate training data to degrade the model …