Adversarial attacks and defenses in images, graphs and text: A review

H Xu, Y Ma, HC Liu, D Deb, H Liu, JL Tang… - International journal of …, 2020 - Springer
Deep neural networks (DNN) have achieved unprecedented success in numerous machine
learning tasks in various domains. However, the existence of adversarial examples raises …

Red teaming language models with language models

E Perez, S Huang, F Song, T Cai, R Ring… - arxiv preprint arxiv …, 2022 - arxiv.org
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …

Universal adversarial triggers for attacking and analyzing NLP

E Wallace, S Feng, N Kandpal, M Gardner… - arxiv preprint arxiv …, 2019 - arxiv.org
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

WM Si, M Backes, J Blackburn, E De Cristofaro… - Proceedings of the …, 2022 - dl.acm.org
Chatbots are used in many applications, eg, automated agents, smart home assistants,
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …

Hierarchical reinforcement learning for open-domain dialog

A Saleh, N Jaques, A Ghandeharioun, J Shen… - Proceedings of the AAAI …, 2020 - aaai.org
Open-domain dialog generation is a challenging problem; maximum likelihood training can
lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and …

Negative training for neural dialogue response generation

T He, J Glass - arxiv preprint arxiv:1903.02134, 2019 - arxiv.org
Although deep learning models have brought tremendous advancements to the field of open-
domain dialogue response generation, recent research results have revealed that the …

TIGS: An inference algorithm for text infilling with gradient search

D Liu, J Fu, P Liu, J Lv - arxiv preprint arxiv:1905.10752, 2019 - arxiv.org
Text infilling is defined as a task for filling in the missing part of a sentence or paragraph,
which is suitable for many real-world natural language generation scenarios. However …

Constructing highly inductive contexts for dialogue safety through controllable reverse generation

Z Zhang, J Cheng, H Sun, J Deng, F Mi, Y Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Large pretrained language models can easily produce toxic or biased content, which is
prohibitive for practical use. In order to detect such toxic generations, existing methods rely …

XAI enhancing cyber defence against adversarial attacks in industrial applications

G Makridis, S Theodoropoulos… - 2022 IEEE 5th …, 2022 - ieeexplore.ieee.org
In recent years there is a surge of interest in the interpretability and explainability of AI
systems, which is largely motivated by the need for ensuring the transparency and …

Say what i want: Towards the dark side of neural dialogue models

H Liu, T Derr, Z Liu, J Tang - arxiv preprint arxiv:1909.06044, 2019 - arxiv.org
Neural dialogue models have been widely adopted in various chatbot applications because
of their good performance in simulating and generalizing human conversations. However …