A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review

P Cheng, Z Wu, W Du, H Zhao, W Lu, G Liu - arxiv preprint arxiv …, 2023 - arxiv.org
Applicating third-party data and models has become a new paradigm for language modeling
in NLP, which also introduces some potential security vulnerabilities because attackers can …

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

Beear: Embedding-based adversarial removal of safety backdoors in instruction-tuned language models

Y Zeng, W Sun, TN Huynh, D Song, B Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of
unsafe behaviors while evading detection during normal interactions. The high …

Mitigating backdoor threats to large language models: Advancement and challenges

Q Liu, W Mo, T Tong, J Xu, F Wang… - 2024 60th Annual …, 2024 - ieeexplore.ieee.org
The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …

Enhancing LLM Capabilities Beyond Scaling Up

W Yin, M Chen, R Zhang, B Zhou… - Proceedings of the …, 2024 - aclanthology.org
General-purpose large language models (LLMs) are progressively expanding both in scale
and access to unpublic training data. This has led to notable progress in a variety of AI …

[PDF][PDF] Combating security and privacy issues in the era of large language models

M Chen, C **ao, H Sun, L Li, L Derczynski… - 2024 - par.nsf.gov
This tutorial seeks to provide a systematic summary of risks and vulnerabilities in security,
privacy and copyright aspects of large language models (LLMs), and most recent solutions …

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

Rethinking Backdoor Detection Evaluation for Language Models

J Yan, WJ Mo, X Ren, R Jia - arxiv preprint arxiv:2409.00399, 2024 - arxiv.org
Backdoor attacks, in which a model behaves maliciously when given an attacker-specified
trigger, pose a major security risk for practitioners who depend on publicly released …

Two heads are better than one: Nested poe for robust defense against multi-backdoors

V Graf, Q Liu, M Chen - arxiv preprint arxiv:2404.02356, 2024 - arxiv.org
Data poisoning backdoor attacks can cause undesirable behaviors in large language
models (LLMs), and defending against them is of increasing importance. Existing defense …