Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Learning to explore in pomdps with informational rewards
Standard exploration methods typically rely on random coverage of the state space or
coverage-promoting exploration bonuses. However, in partially observed settings, the …
coverage-promoting exploration bonuses. However, in partially observed settings, the …
Bayesian design principles for frequentist sequential learning
We develop a general theory to optimize the frequentist regret for sequential learning
problems, where efficient bandit and reinforcement learning algorithms can be derived from …
problems, where efficient bandit and reinforcement learning algorithms can be derived from …
Bayesian reinforcement learning with limited cognitive load
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …
information. As such, a general theory of adaptive behavior should be able to account for the …
Deciding what to model: Value-equivalent sampling for reinforcement learning
The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …
estimates or prior beliefs about the true underlying model of the environment. Recent …
Improved Bayesian regret bounds for Thompson sampling in reinforcement learning
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …
reinforcement learning in a multitude of settings. We present a refined analysis of the …
Leveraging demonstrations to improve online learning: Quality matters
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …
is natural to expect some improvement, but the question is how, and by how much? We …
Information-directed pessimism for offline reinforcement learning
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
Value of Information and Reward Specification in Active Inference and POMDPs
R Wei - arxiv preprint arxiv:2408.06542, 2024 - arxiv.org
Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …
gained popularity due to its intuitive decomposition of the expected value of control into a …
Probabilistic inference in reinforcement learning done right
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …
inference on a graphical model of the Markov decision process (MDP). The core object of …
Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …
learning (MARL) based on the principle of information-directed sampling (IDS). These …