Large language models and causal inference in collaboration: A comprehensive survey
Causal inference has shown potential in enhancing the predictive accuracy, fairness,
robustness, and explainability of Natural Language Processing (NLP) models by capturing …
robustness, and explainability of Natural Language Processing (NLP) models by capturing …
Certifying llm safety against adversarial prompting
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …
Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives
The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …
encompassing various applications and research areas such as robustness, security …
Prompt as triggers for backdoor attack: Examining the vulnerability in language models
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …
Text-crs: A generalized certified robustness framework against textual adversarial attacks
The language models, especially the basic text classification models, have been shown to
be susceptible to textual adversarial attacks such as synonym substitution and word …
be susceptible to textual adversarial attacks such as synonym substitution and word …
Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges
P Kumar - International Journal of Multimedia Information …, 2024 - Springer
Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …
[PDF][PDF] Universal vulnerabilities in large language models: Backdoor attacks for in-context learning
In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has
demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite …
demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite …
Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …
language models have been proposed and successfully implemented. However, this raises …
Certified robustness for large language models with self-denoising
Although large language models (LLMs) have achieved great success in vast real-world
applications, their vulnerabilities towards noisy inputs have significantly limited their uses …
applications, their vulnerabilities towards noisy inputs have significantly limited their uses …
Textual manifold-based defense against natural language adversarial examples
DN Minh, AT Luu - Proceedings of the 2022 Conference on …, 2022 - aclanthology.org
Despite the recent success of large pretrained language models in NLP, they are
susceptible to adversarial examples. Concurrently, several studies on adversarial images …
susceptible to adversarial examples. Concurrently, several studies on adversarial images …