Learning to explore in pomdps with informational rewards

A **e, LM Bhamidipaty, EZ Liu, J Hong… - … on Machine Learning, 2024 - openreview.net
Standard exploration methods typically rely on random coverage of the state space or
coverage-promoting exploration bonuses. However, in partially observed settings, the …

Bayesian design principles for frequentist sequential learning

Y Xu, A Zeevi - International Conference on Machine …, 2023 - proceedings.mlr.press
We develop a general theory to optimize the frequentist regret for sequential learning
problems, where efficient bandit and reinforcement learning algorithms can be derived from …

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc
The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arxiv preprint arxiv:2408.06542, 2024 - arxiv.org
Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2023 - proceedings.neurips.cc
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

Q Zhang, C Bai, S Hu, Z Wang, X Li - arxiv preprint arxiv:2404.19292, 2024 - arxiv.org
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …