Adaptive discretization in online reinforcement learning
Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …
studied extensively on applications such as resource allocation and cache management …
A kernel-based approach to non-stationary reinforcement learning in metric spaces
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-
stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a …
stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a …
Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity
Reinforcement learning algorithms often require finiteness of state and action spaces in
Markov decision processes (MDPs)(also called controlled Markov chains) and various …
Markov decision processes (MDPs)(also called controlled Markov chains) and various …
Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure
The practicality of reinforcement learning algorithms has been limited due to poor scaling
with respect to the problem size, as the sample complexity of learning an ε-optimal policy is …
with respect to the problem size, as the sample complexity of learning an ε-optimal policy is …
Lipschitz bandits with batched feedback
In this paper, we study Lipschitz bandit problems with batched feedback, where the
expected reward is Lipschitz and the reward observations are communicated to the player in …
expected reward is Lipschitz and the reward observations are communicated to the player in …
Effects of sampling and prediction horizon in reinforcement learning
Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation,
unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to …
unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to …
Rich-Observation Reinforcement Learning with Continuous Latent Dynamics
Sample-efficiency and reliability remain major bottlenecks toward wide adoption of
reinforcement learning algorithms in continuous settings with high-dimensional perceptual …
reinforcement learning algorithms in continuous settings with high-dimensional perceptual …
Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation
Language-conditioned robotic learning has significantly enhanced robot adaptability by
enabling a single model to execute diverse tasks in response to verbal commands. Despite …
enabling a single model to execute diverse tasks in response to verbal commands. Despite …