Is rlhf more difficult than standard rl? a theoretical perspective
Abstract Reinforcement learning from Human Feedback (RLHF) learns from preference
signals, while standard Reinforcement Learning (RL) directly learns from reward signals …
signals, while standard Reinforcement Learning (RL) directly learns from reward signals …
Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning
Finding unified complexity measures and algorithms for sample-efficient learning is a central
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …
Making rl with preference-based feedback efficient via randomization
Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
efficient in terms of statistical complexity, computational complexity, and query complexity. In …
Lower bounds for learning in revealing POMDPs
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging
partially observable setting. While it is well-established that learning in Partially Observable …
partially observable setting. While it is well-established that learning in Partially Observable …
On the complexity of multi-agent decision making: From learning in games to partial monitoring
A central problem in the theory of multi-agent reinforcement learning (MARL) is to
understand what structural conditions and algorithmic principles lead to sample-efficient …
understand what structural conditions and algorithmic principles lead to sample-efficient …
Posterior sampling for competitive RL: function approximation and partial observation
This paper investigates posterior sampling algorithms for competitive reinforcement learning
(RL) in the context of general function approximations. Focusing on zero-sum Markov games …
(RL) in the context of general function approximations. Focusing on zero-sum Markov games …
Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms
Partial Observability--where agents can only observe partial information about the true
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …
Optimistic policy gradient in multi-player markov games with a single controller: Convergence beyond the minty property
Policy gradient methods enjoy strong practical performance in numerous tasks in
reinforcement learning. Their theoretical understanding in multiagent settings, however …
reinforcement learning. Their theoretical understanding in multiagent settings, however …
Provably efficient ucb-type algorithms for learning predictive state representations
The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …
Less is more: Robust robot learning via partially observable multi-agent reinforcement learning
In many multi-agent and high-dimensional robotic tasks, the controller can be designed in
either a centralized or decentralized way. Correspondingly, it is possible to use either single …
either a centralized or decentralized way. Correspondingly, it is possible to use either single …