Adaptive discretization in online reinforcement learning

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org
Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

A kernel-based approach to non-stationary reinforcement learning in metric spaces

OD Domingues, P Ménard, M Pirotta… - International …, 2021 - proceedings.mlr.press
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-
stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a …

Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity

A Kara, N Saldi, S Yüksel - Journal of Machine Learning Research, 2023 - jmlr.org
Reinforcement learning algorithms often require finiteness of state and action spaces in
Markov decision processes (MDPs)(also called controlled Markov chains) and various …

Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure

T Sam, Y Chen, CL Yu - Proceedings of the ACM on Measurement and …, 2023 - dl.acm.org
The practicality of reinforcement learning algorithms has been limited due to poor scaling
with respect to the problem size, as the sample complexity of learning an ε-optimal policy is …

Lipschitz bandits with batched feedback

Y Feng, T Wang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In this paper, we study Lipschitz bandit problems with batched feedback, where the
expected reward is Lipschitz and the reward observations are communicated to the player in …

Effects of sampling and prediction horizon in reinforcement learning

P Osinenko, D Dobriborsci - IEEE Access, 2021 - ieeexplore.ieee.org
Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation,
unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to …

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Y Song, L Wu, DJ Foster, A Krishnamurthy - arxiv preprint arxiv …, 2024 - arxiv.org
Sample-efficiency and reliability remain major bottlenecks toward wide adoption of
reinforcement learning algorithms in continuous settings with high-dimensional perceptual …

Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation

K Zhao, H Huang, M Li, Y Wu - arxiv preprint arxiv:2411.15222, 2024 - arxiv.org
Language-conditioned robotic learning has significantly enhanced robot adaptability by
enabling a single model to execute diverse tasks in response to verbal commands. Despite …