Principled reinforcement learning with human feedback from pairwise or k-wise comparisons
We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …
Crowdsourced data management: A survey
Any important data management and analytics tasks cannot be completely addressed by
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …
automated processes. These tasks, such as entity resolution, sentiment analysis, and image …
[LIVRE][B] Communication Complexity: and Applications
A Rao, A Yehudayoff - 2020 - books.google.com
Communication complexity is the mathematical study of scenarios where several parties
need to communicate to achieve a common goal, a situation that naturally appears during …
need to communicate to achieve a common goal, a situation that naturally appears during …
Good quantum error-correcting codes exist
AR Calderbank, PW Shor - Physical Review A, 1996 - APS
A quantum error-correcting code is defined to be a unitary map** (encoding) of k qubits
(two-state quantum systems) into a subspace of the quantum state space of n qubits such …
(two-state quantum systems) into a subspace of the quantum state space of n qubits such …
Revocation and tracing schemes for stateless receivers
We deal with the problem of a center sending a message to a group of users such that some
subset of the users is considered revoked and should not be able to obtain the content of the …
subset of the users is considered revoked and should not be able to obtain the content of the …
Batched multi-armed bandits problem
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …
employed policy must split data into a small number of batches. While the minimax regret for …
The k-armed dueling bandits problem
We study a partial-information online-learning problem where actions are restricted to noisy
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …
comparisons between pairs of strategies (also known as bandits). In contrast to conventional …
Efficient ranking from pairwise comparisons
The ranking of n objects based on pairwise comparisons is a core machine learning
problem, arising in recommender systems, ad placement, player ranking, biological …
problem, arising in recommender systems, ad placement, player ranking, biological …
Large-scale validation and analysis of interleaved search evaluation
Interleaving is an increasingly popular technique for evaluating information retrieval systems
based on implicit user feedback. While a number of isolated studies have analyzed how this …
based on implicit user feedback. While a number of isolated studies have analyzed how this …
How to compress interactive communication
We describe new ways to simulate 2-party communication protocols to get protocols with
potentially smaller communication. We show that every communication protocol that …
potentially smaller communication. We show that every communication protocol that …