The surprising effectiveness of ppo in cooperative multi-agent games

C Yu, A Velu, E Vinitsky, J Gao… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning
algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent …

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

Efficient online reinforcement learning with offline data

PJ Ball, L Smith, I Kostrikov… - … Conference on Machine …, 2023 - proceedings.mlr.press
Sample efficiency and exploration remain major challenges in online reinforcement learning
(RL). A powerful approach that can be applied to address these issues is the inclusion of …

Rambo-rl: Robust adversarial model-based offline reinforcement learning

M Rigter, B Lacerda, N Hawes - Advances in neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …

Loss of plasticity in deep continual learning

S Dohare, JF Hernandez-Garcia, Q Lan, P Rahman… - Nature, 2024 - nature.com
Artificial neural networks, deep-learning methods and the backpropagation algorithm form
the foundation of modern machine learning and artificial intelligence. These methods are …

Secrets of rlhf in large language models part i: Ppo

R Zheng, S Dou, S Gao, Y Hua, W Shen… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …

Recurrent model-free rl can be a strong baseline for many pomdps

T Ni, B Eysenbach, R Salakhutdinov - arxiv preprint arxiv:2110.05038, 2021 - arxiv.org
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …

A survey on transformers in reinforcement learning

W Li, H Luo, Z Lin, C Zhang, Z Lu, D Ye - arxiv preprint arxiv:2301.03044, 2023 - arxiv.org
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …

Hyperparameters in reinforcement learning and how to tune them

T Eimer, M Lindauer… - … Conference on Machine …, 2023 - proceedings.mlr.press
In order to improve reproducibility, deep reinforcement learning (RL) has been adopting
better scientific practices such as standardized evaluation metrics and reporting. However …

Reinforcement learning in practice: Opportunities and challenges

Y Li - arxiv preprint arxiv:2202.11296, 2022 - arxiv.org
This article is a gentle discussion about the field of reinforcement learning in practice, about
opportunities and challenges, touching a broad range of topics, with perspectives and …