Eureka: Evaluating and understanding large foundation models

V Balachandran, J Chen, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org
Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …

Programming refusal with conditional activation steering

BW Lee, I Padhi, KN Ramamurthy, E Miehling… - arxiv preprint arxiv …, 2024 - arxiv.org
LLMs have shown remarkable capabilities, but precisely controlling their response behavior
remains challenging. Existing activation steering methods alter LLM behavior …

SafeWorld: Geo-Diverse Safety Alignment

D Yin, H Qiu, KH Huang, KW Chang, N Peng - arxiv preprint arxiv …, 2024 - arxiv.org
In the rapidly evolving field of Large Language Models (LLMs), ensuring safety is a crucial
and widely discussed topic. However, existing works often overlook the geo-diversity of …

Unanswerability Evaluation for Retreival Augmented Generation

X Peng, PK Choubey, C **ong, CS Wu - arxiv preprint arxiv:2412.12300, 2024 - arxiv.org
Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on
answerable queries, but they overlook the importance of appropriately rejecting …

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

JJ Li, V Pyatkin, M Kleiman-Weiner, L Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
The ideal LLM content moderation system would be both structurally interpretable (so its
decisions can be explained to users) and steerable (to reflect a community's values or align …

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

KH Huang, A Prabhakar, S Dhawan, Y Mao… - arxiv preprint arxiv …, 2024 - arxiv.org
Customer Relationship Management (CRM) systems are vital for modern enterprises,
providing a foundation for managing customer interactions and data. Integrating AI agents …

Reducing the Scope of Language Models with Circuit Breakers

D Yunis, S Huo, C Gunasekara… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models are now deployed in a wide variety of user-facing applications, often for
specific purposes like answering questions about documentation or acting as coding …

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

W Liu, S An, J Lu, M Wu, T Li, X Wang, X Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Role-Playing Agents (RPAs) have shown remarkable performance in various applications,
yet they often struggle to recognize and appropriately respond to hard queries that conflict …

The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them

A Ravichander, S Ghela, D Wadden, Y Choi - openreview.net
Despite their impressive ability to generate high-quality and fluent text, generative large
language models (LLMs) also produce hallucinations: statements that are misaligned with …

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

MIO Yoran, JBM Geva - openreview.net
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations
and sequence repetitions. We propose to view these behaviors as fallbacks that models …