Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …
their remarkable capabilities in performing diverse tasks across various domains. However …
Protein language models are biased by unequal sequence sampling across the tree of life
Protein language models (pLMs) trained on large protein sequence databases have been
used to understand disease and design novel proteins. In design tasks, the likelihood of a …
used to understand disease and design novel proteins. In design tasks, the likelihood of a …
Evaluating language model agency through negotiations
Companies, organizations, and governments increasingly exploit Language Models'(LM)
remarkable capability to display agent-like behavior. As LMs are adopted to perform tasks …
remarkable capability to display agent-like behavior. As LMs are adopted to perform tasks …
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data
Reinforcement Learning from Human Feedback (RLHF) has proven effective in aligning
large language models with human intentions, yet it often relies on complex methodologies …
large language models with human intentions, yet it often relies on complex methodologies …
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmark
We examine multi-task benchmarks in machine learning through the lens of social choice
theory. We draw an analogy between benchmarks and electoral systems, where models are …
theory. We draw an analogy between benchmarks and electoral systems, where models are …
Automating government report generation: A generative ai approach for efficient data extraction, analysis, and visualization
R Gupta, G Pandey, SK Pal - Digital Government: Research and Practice, 2024 - dl.acm.org
This application paper introduces a transformative solution to address the labour-intensive
manual report generation, data searching & report revision process in government entities …
manual report generation, data searching & report revision process in government entities …
PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison
Building a reliable and automated evaluation metric is a necessary but challenging problem
for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess …
for open-domain dialogue systems. Recent studies proposed evaluation metrics that assess …
Compare without Despair: Reliable Preference Evaluation with Generation Separability
Human evaluation of generated language through pairwise preference judgments is
pervasive. However, under common scenarios, such as when generations from a model pair …
pervasive. However, under common scenarios, such as when generations from a model pair …
Prediction-Powered Ranking of Large Language Models
Large language models are often ranked according to their level of alignment with human
preferences--a model is better than other models if its outputs are more frequently preferred …
preferences--a model is better than other models if its outputs are more frequently preferred …