Chatbot arena: An open platform for evaluating llms by human preference
Large Language Models (LLMs) have unlocked new capabilities and applications; however,
evaluating the alignment with human preferences still poses significant challenges. To …
evaluating the alignment with human preferences still poses significant challenges. To …
A survey of attacks on large vision-language models: Resources, advances, and future trends
With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …
Introducing v0. 5 of the ai safety benchmark from mlcommons
This paper introduces v0. 5 of the AI Safety Benchmark, which has been created by the
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to …
Sorry-bench: Systematically evaluating large language model safety refusal behaviors
Evaluating aligned large language models'(LLMs) ability to recognize and reject unsafe user
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …
requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts …
Salad-bench: A hierarchical and comprehensive safety benchmark for large language models
In the rapidly evolving landscape of Large Language Models (LLMs), ensuring robust safety
measures is paramount. To meet this crucial need, we propose\emph {SALAD-Bench}, a …
measures is paramount. To meet this crucial need, we propose\emph {SALAD-Bench}, a …
Wildguard: Open one-stop moderation tools for safety risks, jailbreaks, and refusals of llms
We introduce WildGuard--an open, light-weight moderation tool for LLM safety that achieves
three goals:(1) identifying malicious intent in user prompts,(2) detecting safety risks of model …
three goals:(1) identifying malicious intent in user prompts,(2) detecting safety risks of model …
A Survey on LLM-as-a-Judge
Accurate and consistent evaluation is crucial for decision-making across numerous fields,
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …
yet it remains a challenging task due to inherent subjectivity, variability, and scale. Large …
Shieldgemma: Generative ai content moderation based on gemma
We present ShieldGemma, a comprehensive suite of LLM-based safety content moderation
models built upon Gemma2. These models provide robust, state-of-the-art predictions of …
models built upon Gemma2. These models provide robust, state-of-the-art predictions of …
R-judge: Benchmarking safety risk awareness for llm agents
Large language models (LLMs) have exhibited great potential in autonomously completing
tasks across real-world applications. Despite this, these LLM agents introduce unexpected …
tasks across real-world applications. Despite this, these LLM agents introduce unexpected …
Rigorllm: Resilient guardrails for large language models against undesired content
Recent advancements in Large Language Models (LLMs) have showcased remarkable
capabilities across various tasks in different domains. However, the emergence of biases …
capabilities across various tasks in different domains. However, the emergence of biases …