Finding counterfactually optimal action sequences in continuous state spaces

S Tsirtsis, M Rodriguez - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Whenever a clinician reflects on the efficacy of a sequence of treatment decisions for a
patient, they may try to identify critical time steps where, had they made different decisions …

Efficient learning in non-stationary linear markov decision processes

A Touati, P Vincent - arxiv preprint arxiv:2010.12870, 2020 - arxiv.org
We study episodic reinforcement learning in non-stationary linear (aka low-rank) Markov
Decision Processes (MDPs), ie, both the reward and transition kernel are linear with respect …

Metrics and continuity in reinforcement learning

C Le Lan, MG Bellemare, PS Castro - Proceedings of the AAAI …, 2021 - ojs.aaai.org
In most practical applications of reinforcement learning, it is untenable to maintain direct
estimates for individual states; in continuous-state systems, it is impossible. Instead …

Optimistic initialization for exploration in continuous control

S Lobel, O Gottesman, C Allen, A Bagaria… - Proceedings of the …, 2022 - ojs.aaai.org
Optimistic initialization underpins many theoretically sound exploration schemes in tabular
domains; however, in the deep function approximation setting, optimism can quickly …

Deep radial-basis value functions for continuous control

K Asadi, N Parikh, RE Parr, GD Konidaris… - Proceedings of the …, 2021 - ojs.aaai.org
A core operation in reinforcement learning (RL) is finding an action that is optimal with
respect to a learned value function. This operation is often challenging when the learned …

Control with adaptive Q-learning: A comparison for two classical control problems

JP Araujo, MAT Figueiredo, MA Botto - Engineering Applications of Artificial …, 2022 - Elsevier
This paper evaluates adaptive Q-learning (AQL) and single-partition adaptive Q-learning
(SPAQL), two algorithms for efficient model-free episodic reinforcement learning (RL), in two …

Adaptive discretization using voronoi trees for continuous-action POMDPs

M Hoerger, H Kurniawati, D Kroese, N Ye - International Workshop on the …, 2022 - Springer
Abstract Solving Partially Observable Markov Decision Processes (POMDPs) with
continuous actions is challenging, particularly for high-dimensional action spaces. To …

Adaptive Discretization using Voronoi Trees for Continuous POMDPs

M Hoerger, H Kurniawati… - … International Journal of …, 2024 - journals.sagepub.com
Solving continuous Partially Observable Markov Decision Processes (POMDPs) is
challenging, particularly for high-dimensional continuous action spaces. To alleviate this …

Review of Metrics to Measure the Stability, Robustness and Resilience of Reinforcement Learning

LL Pullum - arxiv preprint arxiv:2203.12048, 2022 - arxiv.org
Reinforcement learning has received significant interest in recent years, due primarily to the
successes of deep reinforcement learning at solving many challenging tasks such as …

Provably adaptive reinforcement learning in metric spaces

T Cao, A Krishnamurthy - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study reinforcement learning in continuous state and action spaces endowed with a
metric. We provide a refined analysis of the algorithm of Sinclair, Banerjee, and Yu (2019) …