Chatbot arena: An open platform for evaluating llms by human preference

WL Chiang, L Zheng, Y Sheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …

A survey of attacks on large vision-language models: Resources, advances, and future trends

D Liu, M Yang, X Qu, P Zhou, Y Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …

Introducing v0. 5 of the ai safety benchmark from mlcommons

B Vidgen, A Agrawal, AM Ahmed, V Akinwande… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …

Sorry-bench: Systematically evaluating large language model safety refusal behaviors

T **e, X Qi, Y Zeng, Y Huang, UM Sehwag… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluating aligned large language models'(LLMs) ability to recognize and reject unsafe user
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …

Salad-bench: A hierarchical and comprehensive safety benchmark for large language models

L Li, B Dong, R Wang, X Hu, W Zuo, D Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety
measures is paramount. To meet this crucial need, we propose\emph {SALAD-Bench}, a …

Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms

S Han, K Rao, A Ettinger, L Jiang, BY Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce WildGuard--an open, light-weight moderation tool for LLM safety that achieves
three goals:(1) identifying malicious intent in user prompts,(2) detecting safety risks of model …

A Survey on LLM-as-a-Judge

J Gu, X Jiang, Z Shi, H Tan, X Zhai, C Xu, W Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …

Shieldgemma: Generative ai content moderation based on gemma

W Zeng, Y Liu, R Mullins, L Peran, J Fernandez… - arxiv preprint arxiv …, 2024 - arxiv.org
We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation
models built upon Gemma2. These models provide robust, state-of-the-art predictions of …

R-judge: Benchmarking safety risk awareness for llm agents

T Yuan, Z He, L Dong, Y Wang, R Zhao, T **a… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have exhibited great potential in autonomously completing
tasks across real-world applications. Despite this, these LLM agents introduce unexpected …

Rigorllm: Resilient guardrails for large language models against undesired content

Z Yuan, Z **ong, Y Zeng, N Yu, R Jia, D Song… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have showcased remarkable
capabilities across various tasks in different domains. However, the emergence of biases …