Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022 - jair.org
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

[HTML][HTML] Review of online learning for control and diagnostics of power converters and drives: Algorithms, implementations and applications

M Zhang, PI Gómez, Q Xu, T Dragicevic - Renewable and Sustainable …, 2023 - Elsevier
Power converters and motor drives are playing a significant role in the transition towards
sustainable energy systems and transportation electrification. In this context, rich diversity of …

Adversarially trained actor critic for offline reinforcement learning

CA Cheng, T ** for uncertainty-driven offline reinforcement learning
C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Hybrid rl: Using both offline and online data can make rl efficient

Y Song, Y Zhou, A Sekhari, JA Bagnell… - arxiv preprint arxiv …, 2022 - arxiv.org
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has
access to an offline dataset and the ability to collect experience via real-world online …

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arxiv preprint arxiv:2107.06226, 2021 - arxiv.org
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Reinforcement learning with human feedback: Learning dynamic choices via pessimism

Z Li, Z Yang, M Wang - arxiv preprint arxiv:2305.18438, 2023 - arxiv.org
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …