Adversarial attacks and defenses in images, graphs and text: A review
Deep neural networks (DNN) have achieved unprecedented success in numerous machine
learning tasks in various domains. However, the existence of adversarial examples raises …
learning tasks in various domains. However, the existence of adversarial examples raises …
Red teaming language models with language models
Language Models (LMs) often cannot be deployed because of their potential to harm users
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
in hard-to-predict ways. Prior work identifies harmful behaviors before deployment by using …
Universal adversarial triggers for attacking and analyzing NLP
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
Why so toxic? measuring and triggering toxic behavior in open-domain chatbots
Chatbots are used in many applications, eg, automated agents, smart home assistants,
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …
Hierarchical reinforcement learning for open-domain dialog
Open-domain dialog generation is a challenging problem; maximum likelihood training can
lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and …
lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and …
Negative training for neural dialogue response generation
Although deep learning models have brought tremendous advancements to the field of open-
domain dialogue response generation, recent research results have revealed that the …
domain dialogue response generation, recent research results have revealed that the …
TIGS: An inference algorithm for text infilling with gradient search
Text infilling is defined as a task for filling in the missing part of a sentence or paragraph,
which is suitable for many real-world natural language generation scenarios. However …
which is suitable for many real-world natural language generation scenarios. However …
Constructing highly inductive contexts for dialogue safety through controllable reverse generation
Large pretrained language models can easily produce toxic or biased content, which is
prohibitive for practical use. In order to detect such toxic generations, existing methods rely …
prohibitive for practical use. In order to detect such toxic generations, existing methods rely …
XAI enhancing cyber defence against adversarial attacks in industrial applications
G Makridis, S Theodoropoulos… - 2022 IEEE 5th …, 2022 - ieeexplore.ieee.org
In recent years there is a surge of interest in the interpretability and explainability of AI
systems, which is largely motivated by the need for ensuring the transparency and …
systems, which is largely motivated by the need for ensuring the transparency and …
Say what i want: Towards the dark side of neural dialogue models
Neural dialogue models have been widely adopted in various chatbot applications because
of their good performance in simulating and generalizing human conversations. However …
of their good performance in simulating and generalizing human conversations. However …