Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - Authorea …, 2024 - techrxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Exploring clean label backdoor attacks and defense in language models

S Zhao, LA Tuan, J Fu, J Wen… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Despite being widely applied, pre-trained language models have been proven vulnerable to
backdoor attacks. Backdoor attacks are designed to introduce targeted vulnerabilities into …

Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review

P Cheng, Z Wu, W Du, H Zhao, W Lu, G Liu - arxiv preprint arxiv …, 2023 - arxiv.org
Applicating third-party data and models has become a new paradigm for language modeling
in NLP, which also introduces some potential security vulnerabilities because attackers can …

Beyond perplexity: Multi-dimensional safety evaluation of llm compression

Z Xu, A Gupta, T Li, O Bentham, V Srikumar - arxiv preprint arxiv …, 2024 - arxiv.org
Increasingly, model compression techniques enable large language models (LLMs) to be
deployed in real-world applications. As a result of this momentum towards local deployment …

Defense against backdoor attack on pre-trained language models via head pruning and attention normalization

X Zhao, D Xu, S Yuan - Forty-first International Conference on …, 2024 - openreview.net
Pre-trained language models (PLMs) are commonly used for various downstream natural
language processing tasks via fine-tuning. However, recent studies have demonstrated that …

Spamdam: towards privacy-preserving and adversary-resistant sms spam detection

Y Li, R Zhang, W Rong, X Mi - arxiv preprint arxiv:2404.09481, 2024 - arxiv.org
In this study, we introduce SpamDam, a SMS spam detection framework designed to
overcome key challenges in detecting and understanding SMS spam, such as the lack of …

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L **ao, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …

Fewer is more: Trojan attacks on parameter-efficient fine-tuning

L Hong, T Wang - 2023 - openreview.net
Parameter-efficient fine-tuning (PEFT) enables efficient adaptation of pre-trained language
models (PLMs) to specific tasks. By tuning only a minimal set of (extra) parameters, PEFT …

Persistent Backdoor Attacks in Continual Learning

Z Guo, A Kumar, R Tourani - arxiv preprint arxiv:2409.13864, 2024 - arxiv.org
Backdoor attacks pose a significant threat to neural networks, enabling adversaries to
manipulate model outputs on specific inputs, often with devastating consequences …