- Academic Search

Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - arxiv preprint arxiv:2404.05399, 2024 - arxiv.org

The last two years have seen a rapid growth in concerns around the safety of large
language models (LLMs). Researchers and practitioners have met these concerns by …

Uložit Citovat Počet citací tohoto článku: 21 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On prompt-driven safeguarding for large language models

C Zheng, F Yin, H Zhou, F Meng, J Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Prepending model inputs with safety prompts is a common practice for safeguarding large
language models (LLMs) against queries with harmful intents. However, the underlying …

Uložit Citovat Počet citací tohoto článku: 66 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Biasasker: Measuring the bias in conversational ai system

Y Wan, W Wang, P He, J Gu, H Bai… - Proceedings of the 31st …, 2023 - dl.acm.org

Powered by advanced Artificial Intelligence (AI) techniques, conversational AI systems, such
as ChatGPT, and digital assistants like Siri, have been widely deployed in daily life …

Uložit Citovat Počet citací tohoto článku: 68 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prosocialdialog: A prosocial backbone for conversational agents

H Kim, Y Yu, L Jiang, X Lu, D Khashabi, G Kim… - arxiv preprint arxiv …, 2022 - arxiv.org

Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …

Uložit Citovat Počet citací tohoto článku: 108 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mirages: On anthropomorphism in dialogue systems

G Abercrombie, AC Curry, T Dinkar, V Rieser… - arxiv preprint arxiv …, 2023 - arxiv.org

Automated dialogue or conversational systems are anthropomorphised by developers and
personified by users. While a degree of anthropomorphism may be inevitable due to the …

Uložit Citovat Počet citací tohoto článku: 73 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

WM Si, M Backes, J Blackburn, E De Cristofaro… - Proceedings of the …, 2022 - dl.acm.org

Chatbots are used in many applications, eg, automated agents, smart home assistants,
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …

Uložit Citovat Počet citací tohoto článku: 72 Související články Všechny verze (počet: 16)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

COLD: A benchmark for Chinese offensive language detection

J Deng, J Zhou, H Sun, C Zheng, F Mi, H Meng… - arxiv preprint arxiv …, 2022 - arxiv.org

Offensive language detection is increasingly crucial for maintaining a civilized social media
platform and deploying pre-trained language models. However, this task in Chinese is still …

Uložit Citovat Počet citací tohoto článku: 95 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] oapen.org

[KNIHA][B] Foundation models for natural language processing: Pre-trained language models integrating media

G Paaß, S Giesselbach - 2023 - library.oapen.org

This open access book provides a comprehensive overview of the state of the art in research
and applications of Foundation Models and is intended for readers familiar with basic …

Uložit Citovat Počet citací tohoto článku: 61 Související články Všechny verze (počet: 11) Hledat knihovnu Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] unibocconi.it

[PDF][PDF] SafetyKit: First aid for measuring safety in open-domain conversational systems

E Dinan, G Abercrombie, SA Bergman… - Proceedings of the …, 2022 - iris.unibocconi.it

The social impact of natural language processing and its applications has received
increasing attention. In this position paper, we focus on the problem of safety for end-to-end …

Uložit Citovat Počet citací tohoto článku: 60 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Through the lens of core competency: Survey on evaluation of large language models

Z Zhuang, Q Chen, L Ma, M Li, Y Han, Y Qian… - arxiv preprint arxiv …, 2023 - arxiv.org

From pre-trained language model (PLM) to large language model (LLM), the field of natural
language processing (NLP) has witnessed steep performance gains and wide practical …

Uložit Citovat Počet citací tohoto článku: 27 Související články Všechny verze (počet: 5) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

On the safety of conversational models: Taxonomy, dataset, and benchmark

Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

On prompt-driven safeguarding for large language models

Biasasker: Measuring the bias in conversational ai system

Prosocialdialog: A prosocial backbone for conversational agents

Mirages: On anthropomorphism in dialogue systems

Why so toxic? measuring and triggering toxic behavior in open-domain chatbots

COLD: A benchmark for Chinese offensive language detection

[KNIHA][B] Foundation models for natural language processing: Pre-trained language models integrating media

[PDF][PDF] SafetyKit: First aid for measuring safety in open-domain conversational systems

Through the lens of core competency: Survey on evaluation of large language models