[PDF][PDF] DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
Abstract Generative Pre-trained Transformer (GPT) models have exhibited exciting progress
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
in their capabilities, capturing the interest of practitioners and the public alike. Yet, while the …
Backdoor Attacks and Defenses Targeting Multi-Domain AI Models: A Comprehensive Review
Since the emergence of security concerns in artificial intelligence (AI), there has been
significant attention devoted to the examination of backdoor attacks. Attackers can utilize …
significant attention devoted to the examination of backdoor attacks. Attackers can utilize …
Backdoor learning: A survey
Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so
that the attacked models perform well on benign samples, whereas their predictions will be …
that the attacked models perform well on benign samples, whereas their predictions will be …
Unique security and privacy threats of large language model: A comprehensive survey
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable advancements in natural language processing. These models are trained …
made remarkable advancements in natural language processing. These models are trained …
Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models
We investigate security concerns of the emergent instruction tuning paradigm, that models
are trained on crowdsourced datasets with task instructions to achieve superior …
are trained on crowdsourced datasets with task instructions to achieve superior …
Privacy in large language models: Attacks, defenses and future directions
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …
effectively tackle various downstream NLP tasks and unify these tasks into generative …
Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation
The vulnerability of deep neural networks (DNN) to backdoor (trojan) attacks is extensively
studied for the image domain. In a backdoor attack, a DNN is modified to exhibit expected …
studied for the image domain. In a backdoor attack, a DNN is modified to exhibit expected …
Shortcut learning of large language models in natural language understanding
Shortcut Learning of Large Language Models in Natural Language Understanding Page 1 110
COMMUNICATIONS OF THE ACM | JANUARY 2024 | VOL. 67 | NO. 1 research IMA GE B Y …
COMMUNICATIONS OF THE ACM | JANUARY 2024 | VOL. 67 | NO. 1 research IMA GE B Y …
Badprompt: Backdoor attacks on continuous prompts
The prompt-based learning paradigm has gained much research attention recently. It has
achieved state-of-the-art performance on several NLP tasks, especially in the few-shot …
achieved state-of-the-art performance on several NLP tasks, especially in the few-shot …
A unified evaluation of textual backdoor learning: Frameworks and benchmarks
Textual backdoor attacks are a kind of practical threat to NLP systems. By injecting a
backdoor in the training phase, the adversary could control model predictions via predefined …
backdoor in the training phase, the adversary could control model predictions via predefined …