Is conditional generative modeling all you need for decision-making?
Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …
high-quality images from language descriptions alone. We investigate whether these …
Offline reinforcement learning as one big sequence modeling problem
Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Curriculum reinforcement learning using optimal transport via gradual domain adaptation
Abstract Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks,
starting from easy ones and gradually learning towards difficult tasks. In this work, we focus …
starting from easy ones and gradually learning towards difficult tasks. In this work, we focus …
Offline reinforcement learning as anti-exploration
Abstract Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed
dataset, without interactions with the system. An agent in this setting should avoid selecting …
dataset, without interactions with the system. An agent in this setting should avoid selecting …
Offline reinforcement learning with value-based episodic memory
Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …
effectively utilizing previously collected data. Most existing offline RL algorithms use …
Offline reinforcement learning with soft behavior regularization
Most prior approaches to offline reinforcement learning (RL) utilize\textit {behavior
regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty …
regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty …
State-action similarity-based representations for off-policy evaluation
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …
expected return of an evaluation policy given a fixed dataset that was collected by running …
Modified DDPG car-following model with a real-world human driving experience with CARLA simulator
In the autonomous driving field, fusion of human knowledge into Deep Reinforcement
Learning (DRL) is often based on the human demonstration recorded in a simulated …
Learning (DRL) is often based on the human demonstration recorded in a simulated …
Provably efficient offline reinforcement learning with trajectory-wise reward
The remarkable success of reinforcement learning (RL) heavily relies on observing the
reward of every visited state-action pair. In many real world applications, however, an agent …
reward of every visited state-action pair. In many real world applications, however, an agent …