Context-free word importance scores for attacking neural networks

N Shakeel, S Shakeel - Journal of Computational and …, 2022 - ojs.bonviewpress.com
Leave-One-Out (LOO) scores provide estimates of feature importance in neural networks, for
adversarial attacks. In this work, we present context-free word scores as a query-efficient …

Robust conversational agents against imperceptible toxicity triggers

N Mehrabi, A Beirami, F Morstatter… - arxiv preprint arxiv …, 2022 - arxiv.org
Warning: this paper contains content that maybe offensive or upsetting. Recent research in
Natural Language Processing (NLP) has advanced the development of various toxicity …

[PDF][PDF] I've Seen Things You Machines Wouldn't Believe: Measuring Content Predictability to Identify Automatically-Generated Text.

P Przybyla, N Duran-Silva, SE Gómez - IberLEF@ SEPLN, 2023 - researchgate.net
Modern large language models (LLMs), such as GPT-4 or ChatGPT, are capable of
producing fluent text in natural languages, making their output hard to manually differentiate …

Effective faking of verbal deception detection with target-aligned adversarial attacks

B Kleinberg, R Loconte, B Verschuere - arxiv preprint arxiv:2501.05962, 2025 - arxiv.org
Background: Deception detection through analysing language is a promising avenue using
both human judgments and automated machine learning judgments. For both forms of …

Identifying human strategies for generating word-level adversarial examples

M Mozes, B Kleinberg, LD Griffin - arxiv preprint arxiv:2210.11598, 2022 - arxiv.org
Adversarial examples in NLP are receiving increasing research attention. One line of
investigation is the generation of word-level adversarial examples against fine-tuned …

User-centered security in natural language processing

C Emmery - arxiv preprint arxiv:2301.04230, 2023 - arxiv.org
This dissertation proposes a framework of user-centered security in Natural Language
Processing (NLP), and demonstrates how it can improve the accessibility of related …

Towards stronger adversarial baselines through human-AI collaboration

W You, D Lowd - Proceedings of NLP Power! The First Workshop …, 2022 - aclanthology.org
Natural language processing (NLP) systems are often used for adversarial tasks such as
detecting spam, abuse, hate speech, and fake news. Properly evaluating such systems …

Evaluating Mitigation Approaches for Adversarial Attacks in Crowdwork

CG Harris - 2023 IEEE International Conference on Big Data …, 2023 - ieeexplore.ieee.org
Crowdsourcing has emerged as a collaborative method to accomplish various tasks using
open calls posted on platforms such as Amazon Mechanical Turk. Most requesters who post …

Ethical and Technological AI Risks Classification: A Human Vs Machine Approach

S Teixeira, B Veloso, JC Rodrigues, J Gama - Joint European Conference …, 2022 - Springer
The growing use of data-driven decision systems based on Artificial Intelligence (AI) by
governments, companies and social organizations has given more attention to the …

Understanding and Guarding against Natural Language Adversarial Examples

MAJ Mozes - 2024 - discovery.ucl.ac.uk
Despite their success, machine learning models have been shown to be susceptible to
adversarial examples: carefully constructed perturbations of model inputs that are intended …