Eureka: Evaluating and understanding large foundation models
Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …
Programming refusal with conditional activation steering
LLMs have shown remarkable capabilities, but precisely controlling their response behavior
remains challenging. Existing activation steering methods alter LLM behavior …
remains challenging. Existing activation steering methods alter LLM behavior …
SafeWorld: Geo-Diverse Safety Alignment
In the rapidly evolving field of Large Language Models (LLMs), ensuring safety is a crucial
and widely discussed topic. However, existing works often overlook the geo-diversity of …
and widely discussed topic. However, existing works often overlook the geo-diversity of …
Unanswerability Evaluation for Retreival Augmented Generation
Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on
answerable queries, but they overlook the importance of appropriately rejecting …
answerable queries, but they overlook the importance of appropriately rejecting …
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
The ideal LLM content moderation system would be both structurally interpretable (so its
decisions can be explained to users) and steerable (to reflect a community's values or align …
decisions can be explained to users) and steerable (to reflect a community's values or align …
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Customer Relationship Management (CRM) systems are vital for modern enterprises,
providing a foundation for managing customer interactions and data. Integrating AI agents …
providing a foundation for managing customer interactions and data. Integrating AI agents …
Reducing the Scope of Language Models with Circuit Breakers
Language models are now deployed in a wide variety of user-facing applications, often for
specific purposes like answering questions about documentation or acting as coding …
specific purposes like answering questions about documentation or acting as coding …
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing
Role-Playing Agents (RPAs) have shown remarkable performance in various applications,
yet they often struggle to recognize and appropriately respond to hard queries that conflict …
yet they often struggle to recognize and appropriately respond to hard queries that conflict …
The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them
Despite their impressive ability to generate high-quality and fluent text, generative large
language models (LLMs) also produce hallucinations: statements that are misaligned with …
language models (LLMs) also produce hallucinations: statements that are misaligned with …
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
MIO Yoran, JBM Geva - openreview.net
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations
and sequence repetitions. We propose to view these behaviors as fallbacks that models …
and sequence repetitions. We propose to view these behaviors as fallbacks that models …