Anti-backdoor learning: Training clean models on poisoned data

Y Li, X Lyu, N Koren, L Lyu, B Li… - Advances in Neural …, 2021 - proceedings.neurips.cc
Backdoor attack has emerged as a major security threat to deep neural networks (DNNs).
While existing defense methods have demonstrated promising results on detecting or …

Neural attention distillation: Erasing backdoor triggers from deep neural networks

Y Li, X Lyu, N Koren, L Lyu, B Li, X Ma - arxiv preprint arxiv:2101.05930, 2021 - arxiv.org
Deep neural networks (DNNs) are known vulnerable to backdoor attacks, a training time
attack that injects a trigger pattern into a small proportion of training data so as to control the …

Backdoor defense with machine unlearning

Y Liu, M Fan, C Chen, X Liu, Z Ma… - IEEE INFOCOM 2022 …, 2022 - ieeexplore.ieee.org
Backdoor injection attack is an emerging threat to the security of neural networks, however,
there still exist limited effective defense methods against the attack. In this paper, we …

Backdoor defense via deconfounded representation learning

Z Zhang, Q Liu, Z Wang, Z Lu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Deep neural networks (DNNs) are recently shown to be vulnerable to backdoor attacks,
where attackers embed hidden backdoors in the DNN model by injecting a few poisoned …

Poisoning attacks and defenses on artificial intelligence: A survey

MA Ramirez, SK Kim, HA Hamadi, E Damiani… - arxiv preprint arxiv …, 2022 - arxiv.org
Machine learning models have been widely adopted in several fields. However, most recent
studies have shown several vulnerabilities from attacks with a potential to jeopardize the …

On the exploitability of reinforcement learning with human feedback for large language models

J Wang, J Wu, M Chen, Y Vorobeychik… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align
Large Language Models (LLMs) with human preferences, playing an important role in LLMs …

GRIP-GAN: An attack-free defense through general robust inverse perturbation

H Zheng, J Chen, H Du, W Zhu, S Ji… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Despite of its tremendous popularity and success in computer vision (CV) and natural
language processing, deep learning is inherently vulnerable to adversarial attacks in which …

One4all: Manipulate one agent to poison the cooperative multi-agent reinforcement learning

H Zheng, X Li, J Chen, J Dong, Y Zhang, C Lin - Computers & Security, 2023 - Elsevier
Reinforcement Learning (RL) has achieved a plenty of breakthroughs in the past decade.
Notably, existing studies have shown that RL is suffered from poisoning attack, which results …

Backdoor attacks on crowd counting

Y Sun, T Zhang, X Ma, P Zhou, J Lou, Z Xu… - Proceedings of the 30th …, 2022 - dl.acm.org
Crowd counting is a regression task that estimates the number of people in a scene image,
which plays a vital role in a range of safety-critical applications, such as video surveillance …

Deeppoison: Feature transfer based stealthy poisoning attack for dnns

J Chen, L Zhang, H Zheng, X Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Deep neural networks are susceptible to poisoning attacks by purposely polluted training
data with specific triggers. As existing episodes mainly focused on attack success rate with …