Knowledge gradient for online reinforcement learning

S Yahyaa, B Manderick - … , ICAART 2014, Angers, France, March 6-8, 2014 …, 2015‏ - Springer
The most interesting challenge for a reinforcement learning agent is to learn online in
unknown large discrete, or continuous stochastic model. The agent has not only to trade-off …

[PDF][PDF] Knowledge Gradient Exploration in Online Least Squares Policy Iteration.

SQ Yahyaa, B Manderick - ICAART (2), 2013‏ - researchgate.net
We compare empirically the knowledge gradient exploration policy with the ε-greedy one in
online leastsquares policy iteration on a testbed of 2 infinite horizon Markov decision …

[PDF][PDF] Knowledge gradient exploration in online kernel-based LSPI

S Yahyaa, B Manderick - Proceedings of the 25th Belgium-Netherlands …, 2013‏ - Citeseer
We introduce online kernel-based LSPI (or least squares policy iteration) which combines
feature of online LSPI and offline kernel-based LSPI. The knowledge gradient is used as …

[PDF][PDF] Explorations in Reinforcement Learning: Online Action Selection and Value Function Approximation

SQ Yahyaa - 2015‏ - researchgate.net
In reinforcement learning, an agent interacts repeatedly with its environment by selecting an
action and receiving a reward while the environment transits from the current state to the …

[PDF][PDF] Online Knowledge Gradient Exploration in an Unknown Environment.

SQ Yahyaa, B Manderick - ICAART (1), 2014‏ - scitepress.org
We present online kernel-based LSPI (or least squares policy iteration) which is an
extension of offline kernelbased LSPI. Online kernel-based LSPI combines characteristics of …