- Academic Search

V Balachandran, J Chen, N Joshi, B Nushi… - arxiv preprint arxiv …, 2024 - arxiv.org

Rigorous and reproducible evaluation is critical for assessing the state of the art and for
guiding scientific advances in Artificial Intelligence. Evaluation is challenging in practice due …

Speichern Zitieren Zitiert von: 6 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Programming refusal with conditional activation steering

BW Lee, I Padhi, KN Ramamurthy, E Miehling… - arxiv preprint arxiv …, 2024 - arxiv.org

LLMs have shown remarkable capabilities, but precisely controlling their response behavior
remains challenging. Existing activation steering methods alter LLM behavior …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

SafeWorld: Geo-Diverse Safety Alignment

D Yin, H Qiu, KH Huang, KW Chang, N Peng - arxiv preprint arxiv …, 2024 - arxiv.org

In the rapidly evolving field of Large Language Models (LLMs), ensuring safety is a crucial
and widely discussed topic. However, existing works often overlook the geo-diversity of …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Unanswerability Evaluation for Retreival Augmented Generation

X Peng, PK Choubey, C **ong, CS Wu - arxiv preprint arxiv:2412.12300, 2024 - arxiv.org

Existing evaluation frameworks for retrieval-augmented generation (RAG) systems focus on
answerable queries, but they overlook the importance of appropriately rejecting …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

JJ Li, V Pyatkin, M Kleiman-Weiner, L Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org

The ideal LLM content moderation system would be both structurally interpretable (so its
decisions can be explained to users) and steerable (to reflect a community's values or align …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

KH Huang, A Prabhakar, S Dhawan, Y Mao… - arxiv preprint arxiv …, 2024 - arxiv.org

Customer Relationship Management (CRM) systems are vital for modern enterprises,
providing a foundation for managing customer interactions and data. Integrating AI agents …

Speichern Zitieren Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Reducing the Scope of Language Models with Circuit Breakers

D Yunis, S Huo, C Gunasekara… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models are now deployed in a wide variety of user-facing applications, often for
specific purposes like answering questions about documentation or acting as coding …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

W Liu, S An, J Lu, M Wu, T Li, X Wang, X Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org

Role-Playing Agents (RPAs) have shown remarkable performance in various applications,
yet they often struggle to recognize and appropriately respond to hard queries that conflict …

Speichern Zitieren Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] openreview.net

The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them

A Ravichander, S Ghela, D Wadden, Y Choi - openreview.net

Despite their impressive ability to generate high-quality and fluent text, generative large
language models (LLMs) also produce hallucinations: statements that are misaligned with …

Speichern Zitieren Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] openreview.net

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

MIO Yoran, JBM Geva - openreview.net

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations
and sequence repetitions. We propose to view these behaviors as fallbacks that models …

Speichern Zitieren Ähnliche Artikel HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

The art of saying no: Contextual noncompliance in language models

Eureka: Evaluating and understanding large foundation models

Programming refusal with conditional activation steering

SafeWorld: Geo-Diverse Safety Alignment

Unanswerability Evaluation for Retreival Augmented Generation

SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Reducing the Scope of Language Models with Circuit Breakers

Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing

The HALoGen Benchmark: Fantastic LLM Hallucinations and Where To Find Them

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty