Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review

P Cheng, Z Wu, W Du, H Zhao, W Lu, G Liu - arxiv preprint arxiv …, 2023 - arxiv.org
Applicating third-party data and models has become a new paradigm for language modeling
in NLP, which also introduces some potential security vulnerabilities because attackers can …

A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

Beyond perplexity: Multi-dimensional safety evaluation of llm compression

Z Xu, A Gupta, T Li, O Bentham, V Srikumar - arxiv preprint arxiv …, 2024 - arxiv.org
Increasingly, model compression techniques enable large language models (LLMs) to be
deployed in real-world applications. As a result of this momentum towards local deployment …

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L **ao, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …

Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning

L Hong, T Wang - arxiv preprint arxiv:2310.00648, 2023 - arxiv.org
Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language
models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT …

Exploring Clean Label Backdoor Attacks and Defense in Language Models

S Zhao, LA Tuan, J Fu, J Wen… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Despite being widely applied, pre-trained language models have been proven vulnerable to
backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into …

SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection

Y Li, R Zhang, W Rong, X Mi - arxiv preprint arxiv:2404.09481, 2024 - arxiv.org
In this study, we introduce SpamDam, a SMS spam detection framework designed to
overcome key challenges in detecting and understanding SMS spam, such as the lack of …

Persistent Backdoor Attacks in Continual Learning

Z Guo, A Kumar, R Tourani - arxiv preprint arxiv:2409.13864, 2024 - arxiv.org
Backdoor attacks pose a significant threat to neural networks, enabling adversaries to
manipulate model outputs on specific inputs, often with devastating consequences …

DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs

Z Guo, R Tourani - arxiv preprint arxiv:2501.18617, 2025 - arxiv.org
With the growing demand for personalized AI solutions, customized LLMs have become a
preferred choice for businesses and individuals, driving the deployment of millions of AI …