Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review

S Zhang, Y Pan, Q Liu, Z Yan, KKR Choo… - ACM Computing …, 2024 - dl.acm.org
Since the emergence of security concerns in artificial intelligence (AI), there has been
significant attention devoted to the examination of backdoor attacks. Attackers can utilize …

A survey on backdoor attack and defense in natural language processing

X Sheng, Z Han, P Li, X Chang - 2022 IEEE 22nd International …, 2022 - ieeexplore.ieee.org
Deep learning is becoming increasingly popular in real-life applications, especially in
natural language processing (NLP). Users often choose training outsourcing or adopt third …

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G **, Y Dong… - Artificial Intelligence …, 2024 - Springer
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Watch out for your agents! investigating backdoor threats to llm-based agents

W Yang, X Bi, Y Lin, S Chen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Driven by the rapid development of Large Language Models (LLMs), LLM-based agents
have been developed to handle various real-world applications, including finance …

Formalizing and benchmarking prompt injection attacks and defenses

Y Liu, Y Jia, R Geng, J Jia, NZ Gong - 33rd USENIX Security Symposium …, 2024 - usenix.org
A prompt injection attack aims to inject malicious instruction/data into the input of an LLM-
Integrated Application such that it produces results as an attacker desires. Existing works are …

Revisiting the assumption of latent separability for backdoor defenses

X Qi, T **e, Y Li, S Mahloujifar… - The eleventh international …, 2023 - openreview.net
Recent studies revealed that deep learning is susceptible to backdoor poisoning attacks. An
adversary can embed a hidden backdoor into a model to manipulate its predictions by only …

Badchain: Backdoor chain-of-thought prompting for large language models

Z **ang, F Jiang, Z **ong, B Ramasubramanian… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting,
particularly when tackling tasks that require systematic reasoning processes. On the other …

Detecting backdoors in pre-trained encoders

S Feng, G Tao, S Cheng, G Shen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised learning in computer vision trains on unlabeled data, such as images or
(image, text) pairs, to obtain an image encoder that learns high-quality embeddings for input …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

A unified evaluation of textual backdoor learning: Frameworks and benchmarks

G Cui, L Yuan, B He, Y Chen… - Advances in Neural …, 2022 - proceedings.neurips.cc
Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a
backdoor in the training phase, the adversary could control model predictions via predefined …