A comprehensive survey of continual learning: Theory, method and application

L Wang, X Zhang, H Su, J Zhu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
To cope with real-world dynamics, an intelligent system needs to incrementally acquire,
update, accumulate, and exploit knowledge throughout its lifetime. This ability, known as …

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

Model-based offline planning

A Argenson, G Dulac-Arnold - arxiv preprint arxiv:2008.05556, 2020 - arxiv.org
Offline learning is a key part of making reinforcement learning (RL) useable in real systems.
Offline RL looks at scenarios where there is data from a system's operation, but no direct …

Continual world: A robotic benchmark for continual reinforcement learning

M Wołczyk, M Zając, R Pascanu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Continual learning (CL)---the ability to continuously learn, building on previously
acquired knowledge---is a natural requirement for long-lived autonomous reinforcement …

Optimizing for the future in non-stationary mdps

Y Chandak, G Theocharous… - International …, 2020 - proceedings.mlr.press
Most reinforcement learning methods are based upon the key assumption that the transition
dynamics and reward functions are fixed, that is, the underlying Markov decision process is …

Prediction and control in continual reinforcement learning

N Anand, D Precup - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Temporal difference (TD) learning is often used to update the estimate of the value function
which is used by RL agents to extract useful policies. In this paper, we focus on value …

Reset-free lifelong learning with skill-space planning

K Lu, A Grover, P Abbeel, I Mordatch - arxiv preprint arxiv:2012.03548, 2020 - arxiv.org
The objective of lifelong reinforcement learning (RL) is to optimize agents which can
continuously adapt and interact in changing environments. However, current RL approaches …

Learning skills to patch plans based on inaccurate models

A Lagrassa, S Lee, O Kroemer - 2020 IEEE/RSJ International …, 2020 - ieeexplore.ieee.org
Planners using accurate models can be effective for accomplishing manipulation tasks in the
real world, but are typically highly specialized and require significant fine-tuning to be …

Neural-progressive hedging: Enforcing constraints in reinforcement learning with stochastic programming

S Ghosh, L Wynter, SH Lim… - Uncertainty in Artificial …, 2022 - proceedings.mlr.press
We propose a framework, called neural-progressive hedging (NP), that leverages stochastic
programming during the online phase of executing a reinforcement learning (RL) policy. The …

Uncertainty-sensitive learning and planning with ensembles

P Miłoś, Ł Kuciński, K Czechowski… - arxiv preprint arxiv …, 2019 - arxiv.org
We propose a reinforcement learning framework for discrete environments in which an
agent makes both strategic and tactical decisions. The former manifests itself through the …