A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

Pretraining language models with human preferences

T Korbak, K Shi, A Chen, RV Bhalerao… - International …, 2023 - proceedings.mlr.press
Abstract Language models (LMs) are pretrained to imitate text from large and diverse
datasets that contain content that would violate human preferences if generated by an LM …

Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

Openchat: Advancing open-source language models with mixed-quality data

G Wang, S Cheng, X Zhan, X Li, S Song… - arxiv preprint arxiv …, 2023 - arxiv.org
Nowadays, open-source large language models like LLaMA have emerged. Recent
developments have incorporated supervised fine-tuning (SFT) and reinforcement learning …

Contrastive learning as goal-conditioned reinforcement learning

B Eysenbach, T Zhang, S Levine… - Advances in Neural …, 2022 - proceedings.neurips.cc
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often …

Onenet: Enhancing time series forecasting models under concept drift by online ensembling

Q Wen, W Chen, L Sun, Z Zhang… - Advances in …, 2023 - proceedings.neurips.cc
Online updating of time series forecasting models aims to address the concept drifting
problem by efficiently updating forecasting models based on streaming data. Many …

Autonomous evaluation and refinement of digital agents

J Pan, Y Zhang, N Tomlin, Y Zhou, S Levine… - arxiv preprint arxiv …, 2024 - arxiv.org
We show that domain-general automatic evaluators can significantly improve the
performance of agents for web navigation and device control. We experiment with multiple …

When does return-conditioned supervised learning work for offline reinforcement learning?

D Brandfonbrener, A Bietti, J Buckman… - Advances in …, 2022 - proceedings.neurips.cc
Several recent works have proposed a class of algorithms for the offline reinforcement
learning (RL) problem that we will refer to as return-conditioned supervised learning …

Masked trajectory models for prediction, representation, and control

P Wu, A Majumdar, K Stone, Y Lin… - International …, 2023 - proceedings.mlr.press
Abstract We introduce Masked Trajectory Models (MTM) as a generic abstraction for
sequential decision making. MTM takes a trajectory, such as a state-action sequence, and …

Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline rl

T Yamagata, A Khalil… - … on Machine Learning, 2023 - proceedings.mlr.press
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional
policy produces promising results. The Decision Transformer (DT) combines the conditional …