Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press
We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

Crowdsourced data management: A survey

G Li, J Wang, Y Zheng… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Any important data management and analytics tasks cannot be completely addressed by
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …

[LIVRE][B] Communication Complexity: and Applications

A Rao, A Yehudayoff - 2020 - books.google.com
Communication complexity is the mathematical study of scenarios where several parties
need to communicate to achieve a common goal, a situation that naturally appears during …

Good quantum error-correcting codes exist

AR Calderbank, PW Shor - Physical Review A, 1996 - APS
A quantum error-correcting code is defined to be a unitary map** (encoding) of k qubits
(two-state quantum systems) into a subspace of the quantum state space of n qubits such …

Revocation and tracing schemes for stateless receivers

D Naor, M Naor, J Lotspiech - … in Cryptology—CRYPTO 2001: 21st Annual …, 2001 - Springer
We deal with the problem of a center sending a message to a group of users such that some
subset of the users is considered revoked and should not be able to obtain the content of the …

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

The k-armed dueling bandits problem

Y Yue, J Broder, R Kleinberg, T Joachims - Journal of Computer and …, 2012 - Elsevier
We study a partial-information online-learning problem where actions are restricted to noisy
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …

Efficient ranking from pairwise comparisons

F Wauthier, M Jordan, N Jojic - International Conference on …, 2013 - proceedings.mlr.press
The ranking of n objects based on pairwise comparisons is a core machine learning
problem, arising in recommender systems, ad placement, player ranking, biological …

Large-scale validation and analysis of interleaved search evaluation

O Chapelle, T Joachims, F Radlinski… - ACM Transactions on …, 2012 - dl.acm.org
Interleaving is an increasingly popular technique for evaluating information retrieval systems
based on implicit user feedback. While a number of isolated studies have analyzed how this …

How to compress interactive communication

B Barak, M Braverman, X Chen, A Rao - Proceedings of the forty-second …, 2010 - dl.acm.org
We describe new ways to simulate 2-party communication protocols to get protocols with
potentially smaller communication. We show that every communication protocol that …