Reinforcement learning from human feedback with active queries

K Ji, J He, Q Gu - arxiv preprint arxiv:2402.09401, 2024 - arxiv.org
Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …

Borda regret minimization for generalized linear dueling bandits

Y Wu, T **, H Lou, F Farnoud, Q Gu - arxiv preprint arxiv:2303.08816, 2023 - arxiv.org
Dueling bandits are widely used to model preferential feedback prevalent in many
applications such as recommendation systems and ranking. In this paper, we study the …

Sparse pairwise re-ranking with pre-trained transformers

L Gienapp, M Fröbe, M Hagen, M Potthast - Proceedings of the 2022 …, 2022 - dl.acm.org
Pairwise re-ranking models predict which of two documents is more relevant to a query and
then aggregate a final ranking from such preferences. This is often more effective than …

Active ranking without strong stochastic transitivity

H Lou, T **, Y Wu, P Xu, Q Gu… - Advances in neural …, 2022 - proceedings.neurips.cc
Ranking from noisy comparisons is of great practical interest in machine learning. In this
paper, we consider the problem of recovering the exact full ranking for a list of items under …

Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem

A Nordio, E Leonardi - arxiv preprint arxiv:2310.02016, 2023 - arxiv.org
We focus on the problem of ranking $ N $ objects starting from a set of noisy pairwise
comparisons provided by a crowd of unequal workers, each worker being characterized by a …

Learning from Human Feedback: Ranking, Bandit, and Preference Optimization

Y Wu - 2024 - search.proquest.com
This dissertation investigates several challenges in artificial intelligence (AI) alignment and
reinforcement learning (RL), particularly focusing on applications when only preference …