A survey on offline reinforcement learning: Taxonomy, review, and open problems
RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …
experienced a dramatic increase in popularity, scaling to previously intractable problems …
Pretraining language models with human preferences
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …
datasets that contain content that would violate human preferences if generated by an LM …
Is conditional generative modeling all you need for decision-making?
Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …
high-quality images from language descriptions alone. We investigate whether these …
Openchat: Advancing open-source language models with mixed-quality data
Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …
Contrastive learning as goal-conditioned reinforcement learning
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …
While deep RL should automatically acquire such good representations, prior work often …
Onenet: Enhancing time series forecasting models under concept drift by online ensembling
Online updating of time series forecasting models aims to address the concept drifting
problem by efficiently updating forecasting models based on streaming data. Many …
problem by efficiently updating forecasting models based on streaming data. Many …
Autonomous evaluation and refinement of digital agents
We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …
performance of agents for web navigation and device control. We experiment with multiple …
When does return-conditioned supervised learning work for offline reinforcement learning?
Several recent works have proposed a class of algorithms for the offline reinforcement
learning (RL) problem that we will refer to as return-conditioned supervised learning …
learning (RL) problem that we will refer to as return-conditioned supervised learning …
Masked trajectory models for prediction, representation, and control
Abstract We introduce Masked Trajectory Models (MTM) as a generic abstraction for
sequential decision making. MTM takes a trajectory, such as a state-action sequence, and …
sequential decision making. MTM takes a trajectory, such as a state-action sequence, and …
Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …
policy produces promising results. The Decision Transformer (DT) combines the conditional …