Reinforced self-training (rest) for language modeling
Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …
language model's (LLM) outputs by aligning them with human preferences. We propose a …
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions
In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …
policies from large offline datasets that can leverage both human demonstrations and …
Supervised pretraining can learn in-context reinforcement learning
Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …
learn in-context, achieving high few-shot performance on tasks they were not explicitly …
Foundation models for decision making: Problems, methods, and opportunities
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …
capabilities in a wide range of vision and language tasks. When such models are deployed …
Steve-1: A generative model for text-to-behavior in minecraft
Constructing AI models that respond to text instructions is challenging, especially for
sequential decision-making tasks. This work introduces an instruction-tuned Video …
sequential decision-making tasks. This work introduces an instruction-tuned Video …
On Transforming Reinforcement Learning With Transformers: The Development Trajectory
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …
significant successes in computer vision (CV). Due to their strong expression power …
A policy-guided imitation approach for offline reinforcement learning
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …
Ceil: Generalized contextual imitation learning
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining
Large transformer models pretrained on offline reinforcement learning datasets have
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …
A survey on transformers in reinforcement learning
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …
under supervised settings. Recently, a similar surge of using Transformers has appeared in …