Optimistic active exploration of dynamical systems

L Treven, C Sancaktar, S Blaes… - Advances in Neural …, 2023 - proceedings.neurips.cc
Reinforcement learning algorithms commonly seek to optimize policies for solving one
particular task. How should we explore an unknown dynamical system such that the …

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arxiv preprint arxiv:2408.06542, 2024 - arxiv.org
Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

Q Zhang, C Bai, S Hu, Z Wang, X Li - arxiv preprint arxiv:2404.19292, 2024 - arxiv.org
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …

[PDF][PDF] Re-move: An adaptive policy design approach for dynamic environments via language-based feedback

S Chakraborty, K Weerakoon, P Poddar… - arxiv preprint arxiv …, 2023 - researchgate.net
Reinforcement learning-based policies for continuous control robotic navigation tasks often
fail to adapt to changes in the environment during real-time deployment, which may result in …

Dealing with sparse rewards in continuous control robotics via heavy-tailed policy optimization

S Chakraborty, AS Bedi, K Weerakoon… - … on Robotics and …, 2023 - ieeexplore.ieee.org
In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG)
algorithm to deal with the challenges of sparse rewards in continuous control problems …

[PDF][PDF] Bandit and RL Reading Notes by Xuanfei

X Ren, P Xu - ustc.edu.cn
Bandit and RL Reading Notes by Xuanfei Page 1 Bandit and RL Reading Notes by Xuanfei
Xuanfei Ren∗, Pan Xu† Abstract Here are some notes on the papers from my study. I think …