The surprising effectiveness of ppo in cooperative multi-agent games
Abstract Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning
algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent …
algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent …
A minimalist approach to offline reinforcement learning
Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …
Efficient online reinforcement learning with offline data
Sample efficiency and exploration remain major challenges in online reinforcement learning
(RL). A powerful approach that can be applied to address these issues is the inclusion of …
(RL). A powerful approach that can be applied to address these issues is the inclusion of …
Rambo-rl: Robust adversarial model-based offline reinforcement learning
Offline reinforcement learning (RL) aims to find performant policies from logged data without
further environment interaction. Model-based algorithms, which learn a model of the …
further environment interaction. Model-based algorithms, which learn a model of the …
Loss of plasticity in deep continual learning
Artificial neural networks, deep-learning methods and the backpropagation algorithm form
the foundation of modern machine learning and artificial intelligence. These methods are …
the foundation of modern machine learning and artificial intelligence. These methods are …
Secrets of rlhf in large language models part i: Ppo
Large language models (LLMs) have formulated a blueprint for the advancement of artificial
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …
general intelligence. Its primary objective is to function as a human-centric (helpful, honest …
Recurrent model-free rl can be a strong baseline for many pomdps
Many problems in RL, such as meta-RL, robust RL, generalization in RL, and temporal credit
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
assignment, can be cast as POMDPs. In theory, simply augmenting model-free RL with …
A survey on transformers in reinforcement learning
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …
under supervised settings. Recently, a similar surge of using Transformers has appeared in …
Hyperparameters in reinforcement learning and how to tune them
In order to improve reproducibility, deep reinforcement learning (RL) has been adopting
better scientific practices such as standardized evaluation metrics and reporting. However …
better scientific practices such as standardized evaluation metrics and reporting. However …
Reinforcement learning in practice: Opportunities and challenges
Y Li - arxiv preprint arxiv:2202.11296, 2022 - arxiv.org
This article is a gentle discussion about the field of reinforcement learning in practice, about
opportunities and challenges, touching a broad range of topics, with perspectives and …
opportunities and challenges, touching a broad range of topics, with perspectives and …