Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arxiv preprint arxiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

Protein language models are biased by unequal sequence sampling across the tree of life

F Ding, J Steinhardt - BioRxiv, 2024 - biorxiv.org
Protein language models (pLMs) trained on large protein sequence databases have been
used to understand disease and design novel proteins. In design tasks, the likelihood of a …

Evaluating language model agency through negotiations

TR Davidson, V Veselovsky, M Josifoski… - arxiv preprint arxiv …, 2024 - arxiv.org
Companies, organizations, and governments increasingly exploit Language Models'(LM)
remarkable capability to display agent-like behavior. As LMs are adopted to perform tasks …

Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data

H **a, S Gao, Q Ge, Z **, Q Zhang, X Huang - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has proven effective in aligning
large language models with human intentions, yet it often relies on complex methodologies …

Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmark

G Zhang, M Hardt - arxiv preprint arxiv:2405.01719, 2024 - arxiv.org
We examine multi-task benchmarks in machine learning through the lens of social choice
theory. We draw an analogy between benchmarks and electoral systems, where models are …

Automating government report generation: A generative ai approach for efficient data extraction, analysis, and visualization

R Gupta, G Pandey, SK Pal - Digital Government: Research and Practice, 2024 - dl.acm.org
This application paper introduces a transformative solution to address the labour-intensive
manual report generation, data searching & report revision process in government entities …

PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison

CH Park, M Choi, D Lee, J Choo - arxiv preprint arxiv:2404.01015, 2024 - arxiv.org
Building a reliable and automated evaluation metric is a necessary but challenging problem
for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess …

Compare without Despair: Reliable Preference Evaluation with Generation Separability

S Ghosh, T Srinivasan, S Swayamdipta - arxiv preprint arxiv:2407.01878, 2024 - arxiv.org
Human evaluation of generated language through pairwise preference judgments is
pervasive. However, under common scenarios, such as when generations from a model pair …

Prediction-Powered Ranking of Large Language Models

I Chatzi, E Straitouri, S Thejaswi… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models are often ranked according to their level of alignment with human
preferences--a model is better than other models if its outputs are more frequently preferred …