A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Robust natural language processing: Recent advances, challenges, and future directions

M Omar, S Choi, DH Nyang, D Mohaisen - IEEE Access, 2022 - ieeexplore.ieee.org
Recent natural language processing (NLP) techniques have accomplished high
performance on benchmark data sets, primarily due to the significant improvement in the …

On evaluating adversarial robustness of large vision-language models

Y Zhao, T Pang, C Du, X Yang, C Li… - Advances in …, 2024 - proceedings.neurips.cc
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented
performance in response generation, especially with visual inputs, enabling more creative …

Threats, attacks and defenses to federated learning: issues, taxonomy and perspectives

P Liu, X Xu, W Wang - Cybersecurity, 2022 - Springer
Abstract Empirical attacks on Federated Learning (FL) systems indicate that FL is fraught
with numerous attack surfaces throughout the FL execution. These attacks can not only …

Adversarial attacks on deep-learning models in natural language processing: A survey

WE Zhang, QZ Sheng, A Alhazmi, C Li - ACM Transactions on Intelligent …, 2020 - dl.acm.org
With the development of high computational devices, deep neural networks (DNNs), in
recent years, have gained significant popularity in many Artificial Intelligence (AI) …

Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models

T Wu, MT Ribeiro, J Heer, DS Weld - arxiv preprint arxiv:2101.00288, 2021 - arxiv.org
While counterfactual examples are useful for analysis and training of NLP models, current
generation methods either rely on manual labor to create very few counterfactuals, or only …

Word-level textual adversarial attacking as combinatorial optimization

Y Zang, F Qi, C Yang, Z Liu, M Zhang, Q Liu… - arxiv preprint arxiv …, 2019 - arxiv.org
Adversarial attacks are carried out to reveal the vulnerability of deep neural networks.
Textual adversarial attacking is challenging because text is discrete and a small perturbation …

Contextualized perturbation for textual adversarial attack

D Li, Y Zhang, H Peng, L Chen, C Brockett… - arxiv preprint arxiv …, 2020 - arxiv.org
Adversarial examples expose the vulnerabilities of natural language processing (NLP)
models, and can be used to evaluate and improve their robustness. Existing techniques of …

Turn the combination lock: Learnable textual backdoor attacks via word substitution

F Qi, Y Yao, S Xu, Z Liu, M Sun - arxiv preprint arxiv:2106.06361, 2021 - arxiv.org
Recent studies show that neural natural language processing (NLP) models are vulnerable
to backdoor attacks. Injected with backdoors, models perform normally on benign examples …

Explaining NLP models via minimal contrastive editing (MiCE)

A Ross, A Marasović, ME Peters - arxiv preprint arxiv:2012.13985, 2020 - arxiv.org
Humans have been shown to give contrastive explanations, which explain why an observed
event happened rather than some other counterfactual event (the contrast case). Despite the …