Reinforcement learning from human feedback with active queries
Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …
modern generative models and can be achieved by reinforcement learning from human …
Borda regret minimization for generalized linear dueling bandits
Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …
applications such as recommendation systems and ranking. In this paper, we study the …
Sparse pairwise re-ranking with pre-trained transformers
Pairwise re-ranking models predict which of two documents is more relevant to a query and
then aggregate a final ranking from such preferences. This is often more effective than …
then aggregate a final ranking from such preferences. This is often more effective than …
Active ranking without strong stochastic transitivity
Ranking from noisy comparisons is of great practical interest in machine learning. In this
paper, we consider the problem of recovering the exact full ranking for a list of items under …
paper, we consider the problem of recovering the exact full ranking for a list of items under …
Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem
We focus on the problem of ranking $ N $ objects starting from a set of noisy pairwise
comparisons provided by a crowd of unequal workers, each worker being characterized by a …
comparisons provided by a crowd of unequal workers, each worker being characterized by a …
Learning from Human Feedback: Ranking, Bandit, and Preference Optimization
Y Wu - 2024 - search.proquest.com
This dissertation investigates several challenges in artificial intelligence (AI) alignment and
reinforcement learning (RL), particularly focusing on applications when only preference …
reinforcement learning (RL), particularly focusing on applications when only preference …