[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.

B Wang, W Chen, H Pei, C **e, M Kang, C Zhang, C Xu… - NeurIPS, 2023 - blogs.qub.ac.uk
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …

Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive Review

S Zhang, Y Pan, Q Liu, Z Yan, KKR Choo… - ACM Computing …, 2024 - dl.acm.org
Since the emergence of security concerns in artificial intelligence (AI), there has been
significant attention devoted to the examination of backdoor attacks. Attackers can utilize …

Backdoor learning: A survey

Y Li, Y Jiang, Z Li, ST **a - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so
that the attacked models perform well on benign samples, whereas their predictions will be …

Unique security and privacy threats of large language model: A comprehensive survey

S Wang, T Zhu, B Liu, M Ding, X Guo, D Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable advancements in natural language processing. These models are trained …

Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models

J Xu, MD Ma, F Wang, C **ao, M Chen - arxiv preprint arxiv:2305.14710, 2023 - arxiv.org
We investigate security concerns of the emergent instruction tuning paradigm, that models
are trained on crowdsourced datasets with task instructions to achieve superior …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation

X Pan, M Zhang, B Sheng, J Zhu, M Yang - 31st USENIX Security …, 2022 - usenix.org
The vulnerability of deep neural networks (DNN) to backdoor (trojan) attacks is extensively
studied for the image domain. In a backdoor attack, a DNN is modified to exhibit expected …

Shortcut learning of large language models in natural language understanding

M Du, F He, N Zou, D Tao, X Hu - Communications of the ACM, 2023 - dl.acm.org
Shortcut Learning of Large Language Models in Natural Language Understanding Page 1 110
COMMUNICATIONS OF THE ACM | JANUARY 2024 | VOL. 67 | NO. 1 research IMA GE B Y …

Badprompt: Backdoor attacks on continuous prompts

X Cai, H Xu, S Xu, Y Zhang - Advances in Neural …, 2022 - proceedings.neurips.cc
The prompt-based learning paradigm has gained much research attention recently. It has
achieved state-of-the-art performance on several NLP tasks, especially in the few-shot …

A unified evaluation of textual backdoor learning: Frameworks and benchmarks

G Cui, L Yuan, B He, Y Chen… - Advances in Neural …, 2022 - proceedings.neurips.cc
Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a
backdoor in the training phase, the adversary could control model predictions via predefined …