Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - arxiv preprint arxiv:2404.05399, 2024 - arxiv.org
The last two years have seen a rapid growth in concerns around the safety of large
language models (LLMs). Researchers and practitioners have met these concerns by …

On prompt-driven safeguarding for large language models

C Zheng, F Yin, H Zhou, F Meng, J Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Prepending model inputs with safety prompts is a common practice for safeguarding large
language models (LLMs) against queries with harmful intents. However, the underlying …

Biasasker: Measuring the bias in conversational ai system

Y Wan, W Wang, P He, J Gu, H Bai… - Proceedings of the 31st …, 2023 - dl.acm.org
Powered by advanced Artificial Intelligence (AI) techniques, conversational AI systems, such
as ChatGPT, and digital assistants like Siri, have been widely deployed in daily life …

Prosocialdialog: A prosocial backbone for conversational agents

H Kim, Y Yu, L Jiang, X Lu, D Khashabi, G Kim… - arxiv preprint arxiv …, 2022 - arxiv.org
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …

Mirages: On anthropomorphism in dialogue systems

G Abercrombie, AC Curry, T Dinkar, V Rieser… - arxiv preprint arxiv …, 2023 - arxiv.org
Automated dialogue or conversational systems are anthropomorphised by developers and
personified by users. While a degree of anthropomorphism may be inevitable due to the …

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

WM Si, M Backes, J Blackburn, E De Cristofaro… - Proceedings of the …, 2022 - dl.acm.org
Chatbots are used in many applications, eg, automated agents, smart home assistants,
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …

COLD: A benchmark for Chinese offensive language detection

J Deng, J Zhou, H Sun, C Zheng, F Mi, H Meng… - arxiv preprint arxiv …, 2022 - arxiv.org
Offensive language detection is increasingly crucial for maintaining a civilized social media
platform and deploying pre-trained language models. However, this task in Chinese is still …

[KNIHA][B] Foundation models for natural language processing: Pre-trained language models integrating media

G Paaß, S Giesselbach - 2023 - library.oapen.org
This open access book provides a comprehensive overview of the state of the art in research
and applications of Foundation Models and is intended for readers familiar with basic …

[PDF][PDF] SafetyKit: First aid for measuring safety in open-domain conversational systems

E Dinan, G Abercrombie, SA Bergman… - Proceedings of the …, 2022 - iris.unibocconi.it
The social impact of natural language processing and its applications has received
increasing attention. In this position paper, we focus on the problem of safety for end-to-end …

Through the lens of core competency: Survey on evaluation of large language models

Z Zhuang, Q Chen, L Ma, M Li, Y Han, Y Qian… - arxiv preprint arxiv …, 2023 - arxiv.org
From pre-trained language model (PLM) to large language model (LLM), the field of natural
language processing (NLP) has witnessed steep performance gains and wide practical …