A survey on offline reinforcement learning: Taxonomy, review, and open problems
RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …
experienced a dramatic increase in popularity, scaling to previously intractable problems …
A generalist agent
Inspired by progress in large-scale language modeling, we apply a similar approach
towards building a single generalist agent beyond the realm of text outputs. The agent …
towards building a single generalist agent beyond the realm of text outputs. The agent …
Dataset distillation by matching training trajectories
Dataset distillation is the task of synthesizing a small dataset such that a model trained on
the synthetic set will match the test accuracy of the model trained on the full dataset. In this …
the synthetic set will match the test accuracy of the model trained on the full dataset. In this …
A survey of zero-shot generalisation in deep reinforcement learning
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …
produce RL algorithms whose policies generalise well to novel unseen situations at …
Multi-game decision transformers
A longstanding goal of the field of AI is a method for learning a highly capable, generalist
agent from diverse experience. In the subfields of vision and language, this was largely …
agent from diverse experience. In the subfields of vision and language, this was largely …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
Implicit behavioral cloning
We find that across a wide range of robot policy learning scenarios, treating supervised
policy learning with an implicit model generally performs better, on average, than commonly …
policy learning with an implicit model generally performs better, on average, than commonly …
What matters in learning from offline human demonstrations for robot manipulation
Imitating human demonstrations is a promising approach to endow robots with various
manipulation capabilities. While recent advances have been made in imitation learning and …
manipulation capabilities. While recent advances have been made in imitation learning and …
Foundation models for decision making: Problems, methods, and opportunities
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …
capabilities in a wide range of vision and language tasks. When such models are deployed …
Accelerating reinforcement learning with learned skill priors
Intelligent agents rely heavily on prior experience when learning a new task, yet most
modern reinforcement learning (RL) approaches learn every task from scratch. One …
modern reinforcement learning (RL) approaches learn every task from scratch. One …