Študovňa Google

A **e, LM Bhamidipaty, EZ Liu, J Hong… - … on Machine Learning, 2024 - openreview.net

Standard exploration methods typically rely on random coverage of the state space or
coverage-promoting exploration bonuses. However, in partially observed settings, the …

Uložiť Citovať Citované 3-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Bayesian design principles for frequentist sequential learning

Y Xu, A Zeevi - International Conference on Machine …, 2023 - proceedings.mlr.press

We develop a general theory to optimize the frequentist regret for sequential learning
problems, where efficient bandit and reinforcement learning algorithms can be derived from …

Uložiť Citovať Citované 15-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu

All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

Uložiť Citovať Citované 12-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc

The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

Uložiť Citovať Citované 16-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Uložiť Citovať Citované 5-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press

We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Uložiť Citovať Citované 8-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net

Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Uložiť Citovať Citované 1-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arxiv preprint arxiv:2408.06542, 2024 - arxiv.org

Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 3 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2023 - proceedings.neurips.cc

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

Uložiť Citovať Citované 4-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

Q Zhang, C Bai, S Hu, Z Wang, X Li - arxiv preprint arxiv:2404.19292, 2024 - arxiv.org

This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …

Uložiť Citovať Citované 2-krát Súvisiace články Všetky verzie 2 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Regret bounds for information-directed reinforcement learning

Learning to explore in pomdps with informational rewards

Bayesian design principles for frequentist sequential learning

Bayesian reinforcement learning with limited cognitive load

Deciding what to model: Value-equivalent sampling for reinforcement learning

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

Leveraging demonstrations to improve online learning: Quality matters

Information-directed pessimism for offline reinforcement learning

Value of Information and Reward Specification in Active Inference and POMDPs

Probabilistic inference in reinforcement learning done right

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning