Large language models and causal inference in collaboration: A comprehensive survey

X Liu, P Xu, J Wu, J Yuan, Y Yang, Y Zhou, F Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Causal inference has shown potential in enhancing the predictive accuracy, fairness,
robustness, and explainability of Natural Language Processing (NLP) models by capturing …

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arxiv preprint arxiv:2307.16851, 2023 - arxiv.org
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

S Zhao, J Wen, LA Tuan, J Zhao, J Fu - arxiv preprint arxiv:2305.01219, 2023 - arxiv.org
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …

Text-crs: A generalized certified robustness framework against textual adversarial attacks

X Zhang, H Hong, Y Hong, P Huang… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org
The language models, especially the basic text classification models, have been shown to
be susceptible to textual adversarial attacks such as synonym substitution and word …

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

P Kumar - International Journal of Multimedia Information …, 2024 - Springer
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …

[PDF][PDF] Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

S Zhao, M Jia, LA Tuan, F Pan… - arxiv preprint arxiv …, 2024 - researchgate.net
In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has
demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite …

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

Certified robustness for large language models with self-denoising

Z Zhang, G Zhang, B Hou, W Fan, Q Li, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Although large language models (LLMs) have achieved great success in vast real-world
applications, their vulnerabilities towards noisy inputs have significantly limited their uses …

Textual manifold-based defense against natural language adversarial examples

DN Minh, AT Luu - Proceedings of the 2022 Conference on …, 2022 - aclanthology.org
Despite the recent success of large pretrained language models in NLP, they are
susceptible to adversarial examples. Concurrently, several studies on adversarial images …