Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

RL, but don't do anything I wouldn't do

MK Cohen, M Hutter, Y Bengio, S Russell - arxiv preprint arxiv …, 2024 - arxiv.org
In reinforcement learning, if the agent's reward differs from the designers' true utility, even
only rarely, the state distribution resulting from the agent's policy can be very bad, in theory …

[PDF][PDF] Limit-Computable Grains of Truth for Arbitrary Computable Extensive-Form (Un) Known Games

C Wyeth, M Hutter, J Leike, J Taylor - 2024 - colewyeth.com
A Bayesian agent acting in a multi-agent environment learns to predict the other agents'
policies if its prior assigns positive probability to them (in other words, its prior contains a …

[CITATION][C] Don't do anything I'd never do? KL regularization to an imitative “base model” does not implement this