Μελετητής Google

X Liu, P Xu, J Wu, J Yuan, Y Yang, Y Zhou, F Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Causal inference has shown potential in enhancing the predictive accuracy, fairness,
robustness, and explainability of Natural Language Processing (NLP) models by capturing …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 60 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Certifying llm safety against adversarial prompting

A Kumar, C Agarwal, S Srinivas, AJ Li, S Feizi… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) are vulnerable to adversarial attacks that add malicious
tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 142 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

H Liu, M Chaudhary, H Wang - arxiv preprint arxiv:2307.16851, 2023 - arxiv.org

The trustworthiness of machine learning has emerged as a critical topic in the field,
encompassing various applications and research areas such as robustness, security …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 25 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

S Zhao, J Wen, LA Tuan, J Zhao, J Fu - arxiv preprint arxiv:2305.01219, 2023 - arxiv.org

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 77 Σχετικά άρθρα Όλες οι 5 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text-crs: A generalized certified robustness framework against textual adversarial attacks

X Zhang, H Hong, Y Hong, P Huang… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org

The language models, especially the basic text classification models, have been shown to
be susceptible to textual adversarial attacks such as synonym substitution and word …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 21 Σχετικά άρθρα Όλες οι 4 εκδοχές

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

P Kumar - International Journal of Multimedia Information …, 2024 - Springer

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 8 Σχετικά άρθρα

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

S Zhao, M Jia, LA Tuan, F Pan… - arxiv preprint arxiv …, 2024 - researchgate.net

In-context learning, a paradigm bridging the gap between pre-training and fine-tuning, has
demonstrated high efficacy in several NLP tasks, especially in few-shot settings. Despite …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 33 Σχετικά άρθρα Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 19 Σχετικά άρθρα Όλες οι 3 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Certified robustness for large language models with self-denoising

Z Zhang, G Zhang, B Hou, W Fan, Q Li, S Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Although large language models (LLMs) have achieved great success in vast real-world
applications, their vulnerabilities towards noisy inputs have significantly limited their uses …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 22 Σχετικά άρθρα Όλες οι 2 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Textual manifold-based defense against natural language adversarial examples

DN Minh, AT Luu - Proceedings of the 2022 Conference on …, 2022 - aclanthology.org

Despite the recent success of large pretrained language models in NLP, they are
susceptible to adversarial examples. Concurrently, several studies on adversarial images …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 18 Σχετικά άρθρα Προβολή ως HTML

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Certified robustness against natural language attacks by causal intervention

Large language models and causal inference in collaboration: A comprehensive survey

Certifying llm safety against adversarial prompting

Towards trustworthy and aligned machine learning: A data-centric survey with causality perspectives

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

Text-crs: A generalized certified robustness framework against textual adversarial attacks

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

[PDF][PDF] Universal vulnerabilities in large language models: Backdoor attacks for in-context learning

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

Certified robustness for large language models with self-denoising

Textual manifold-based defense against natural language adversarial examples