- Academic Search

L Treven, C Sancaktar, S Blaes… - Advances in Neural …, 2023 - proceedings.neurips.cc

Reinforcement learning algorithms commonly seek to optimize policies for solving one
particular task. How should we explore an unknown dynamical system such that the …

Lagre Referanse Sitert av 7 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net

Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Lagre Referanse Sitert av 1 Beslektede artikler Alle 4 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arxiv preprint arxiv:2408.06542, 2024 - arxiv.org

Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

Q Zhang, C Bai, S Hu, Z Wang, X Li - arxiv preprint arxiv:2404.19292, 2024 - arxiv.org

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …

Lagre Referanse Sitert av 2 Beslektede artikler Alle 2 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Re-move: An adaptive policy design approach for dynamic environments via language-based feedback

S Chakraborty, K Weerakoon, P Poddar… - arxiv preprint arxiv …, 2023 - researchgate.net

Reinforcement learning-based policies for continuous control robotic navigation tasks often
fail to adapt to changes in the environment during real-time deployment, which may result in …

Lagre Referanse Sitert av 5 Beslektede artikler HTML-versjon

Dealing with sparse rewards in continuous control robotics via heavy-tailed policy optimization

S Chakraborty, AS Bedi, K Weerakoon… - … on Robotics and …, 2023 - ieeexplore.ieee.org

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG)
algorithm to deal with the challenges of sparse rewards in continuous control problems …

Lagre Referanse Sitert av 3 Beslektede artikler Alle 2 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] ustc.edu.cn

[PDF][PDF] Bandit and RL Reading Notes by Xuanfei

X Ren, P Xu - ustc.edu.cn

Bandit and RL Reading Notes by Xuanfei Page 1 Bandit and RL Reading Notes by Xuanfei
Xuanfei Ren∗, Pan Xu† Abstract Here are some notes on the papers from my study. I think …

Lagre Referanse Beslektede artikler HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Steering: Stein information directed exploration for model-based reinforcement learning

Optimistic active exploration of dynamical systems

Information-directed pessimism for offline reinforcement learning

Value of Information and Reward Specification in Active Inference and POMDPs

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

[PDF][PDF] Re-move: An adaptive policy design approach for dynamic environments via language-based feedback

Dealing with sparse rewards in continuous control robotics via heavy-tailed policy optimization

[PDF][PDF] Bandit and RL Reading Notes by Xuanfei