Is conditional generative modeling all you need for decision-making?

A Ajay, Y Du, A Gupta, J Tenenbaum… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc
Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Curriculum reinforcement learning using optimal transport via gradual domain adaptation

P Huang, M Xu, J Zhu, L Shi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks,
starting from easy ones and gradually learning towards difficult tasks. In this work, we focus …

Offline reinforcement learning as anti-exploration

S Rezaeifar, R Dadashi, N Vieillard… - Proceedings of the …, 2022 - ojs.aaai.org
Abstract Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed
dataset, without interactions with the system. An agent in this setting should avoid selecting …

Offline reinforcement learning with value-based episodic memory

X Ma, Y Yang, H Hu, Q Liu, J Yang, C Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …

Offline reinforcement learning with soft behavior regularization

H Xu, X Zhan, J Li, H Yin - arxiv preprint arxiv:2110.07395, 2021 - arxiv.org
Most prior approaches to offline reinforcement learning (RL) utilize\textit {behavior
regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty …

State-action similarity-based representations for off-policy evaluation

B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …

Modified DDPG car-following model with a real-world human driving experience with CARLA simulator

D Li, O Okhrin - Transportation research part C: emerging technologies, 2023 - Elsevier
In the autonomous driving field, fusion of human knowledge into Deep Reinforcement
Learning (DRL) is often based on the human demonstration recorded in a simulated …

Provably efficient offline reinforcement learning with trajectory-wise reward

T Xu, Y Wang, S Zou, Y Liang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The remarkable success of reinforcement learning (RL) heavily relies on observing the
reward of every visited state-action pair. In many real world applications, however, an agent …