Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review
Applicating third-party data and models has become a new paradigm for language modeling
in NLP, which also introduces some potential security vulnerabilities because attackers can …
in NLP, which also introduces some potential security vulnerabilities because attackers can …
A survey of backdoor attacks and defenses on large language models: Implications for security measures
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …
understanding and complex problem-solving, achieve state-of-the-art performance on …
Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …
language models have been proposed and successfully implemented. However, this raises …
Beyond perplexity: Multi-dimensional safety evaluation of llm compression
Increasingly, model compression techniques enable large language models (LLMs) to be
deployed in real-world applications. As a result of this momentum towards local deployment …
deployed in real-world applications. As a result of this momentum towards local deployment …
Weak-to-Strong Backdoor Attack for Large Language Models
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …
Fewer is More: Trojan Attacks on Parameter-Efficient Fine-Tuning
L Hong, T Wang - arxiv preprint arxiv:2310.00648, 2023 - arxiv.org
Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language
models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT …
models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT …
Exploring Clean Label Backdoor Attacks and Defense in Language Models
Despite being widely applied, pre-trained language models have been proven vulnerable to
backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into …
backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into …
SpamDam: Towards Privacy-Preserving and Adversary-Resistant SMS Spam Detection
Y Li, R Zhang, W Rong, X Mi - arxiv preprint arxiv:2404.09481, 2024 - arxiv.org
In this study, we introduce SpamDam, a SMS spam detection framework designed to
overcome key challenges in detecting and understanding SMS spam, such as the lack of …
overcome key challenges in detecting and understanding SMS spam, such as the lack of …
Persistent Backdoor Attacks in Continual Learning
Backdoor attacks pose a significant threat to neural networks, enabling adversaries to
manipulate model outputs on specific inputs, often with devastating consequences …
manipulate model outputs on specific inputs, often with devastating consequences …
DarkMind: Latent Chain-of-Thought Backdoor in Customized LLMs
Z Guo, R Tourani - arxiv preprint arxiv:2501.18617, 2025 - arxiv.org
With the growing demand for personalized AI solutions, customized LLMs have become a
preferred choice for businesses and individuals, driving the deployment of millions of AI …
preferred choice for businesses and individuals, driving the deployment of millions of AI …