The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025‏ - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Towards continual reinforcement learning: A review and perspectives

K Khetarpal, M Riemer, I Rish, D Precup - Journal of Artificial Intelligence …, 2022‏ - jair.org
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …

Scaling laws for reward model overoptimization

L Gao, J Schulman, J Hilton - International Conference on …, 2023‏ - proceedings.mlr.press
In reinforcement learning from human feedback, it is common to optimize against a reward
model trained to predict human preferences. Because the reward model is an imperfect …

A survey of zero-shot generalisation in deep reinforcement learning

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023‏ - jair.org
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

Leveraging procedural generation to benchmark reinforcement learning

K Cobbe, C Hesse, J Hilton… - … conference on machine …, 2020‏ - proceedings.mlr.press
Abstract We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like
environments designed to benchmark both sample efficiency and generalization in …

Quantifying generalization in reinforcement learning

K Cobbe, O Klimov, C Hesse, T Kim… - … on machine learning, 2019‏ - proceedings.mlr.press
In this paper, we investigate the problem of overfitting in deep reinforcement learning.
Among the most common benchmarks in RL, it is customary to use the same environments …

Loss of plasticity in continual deep reinforcement learning

Z Abbas, R Zhao, J Modayil, A White… - … on lifelong learning …, 2023‏ - proceedings.mlr.press
In this paper, we characterize the behavior of canonical value-based deep reinforcement
learning (RL) approaches under varying degrees of non-stationarity. In particular, we …

Stabilizing deep q-learning with convnets and vision transformers under data augmentation

N Hansen, H Su, X Wang - Advances in neural information …, 2021‏ - proceedings.neurips.cc
While agents trained by Reinforcement Learning (RL) can solve increasingly challenging
tasks directly from visual observations, generalizing learned skills to novel environments …

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023‏ - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

Stop regressing: Training value functions via classification for scalable deep rl

J Farebrother, J Orbay, Q Vuong, AA Taïga… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Value functions are a central component of deep reinforcement learning (RL). These
functions, parameterized by neural networks, are trained using a mean squared error …