- Academic Search

A Ajay, Y Du, A Gupta, J Tenenbaum… - arxiv preprint arxiv …, 2022 - arxiv.org

Recent improvements in conditional generative modeling have made it possible to generate
high-quality images from language descriptions alone. We investigate whether these …

Opslaan Citeren Geciteerd door 357 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

Opslaan Citeren Geciteerd door 790 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Opslaan Citeren Geciteerd door 110 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Curriculum reinforcement learning using optimal transport via gradual domain adaptation

P Huang, M Xu, J Zhu, L Shi… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Curriculum Reinforcement Learning (CRL) aims to create a sequence of tasks,
starting from easy ones and gradually learning towards difficult tasks. In this work, we focus …

Opslaan Citeren Geciteerd door 31 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Offline reinforcement learning as anti-exploration

S Rezaeifar, R Dadashi, N Vieillard… - Proceedings of the …, 2022 - ojs.aaai.org

Abstract Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed
dataset, without interactions with the system. An agent in this setting should avoid selecting …

Opslaan Citeren Geciteerd door 61 Verwante artikelen Alle 11 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline reinforcement learning with value-based episodic memory

X Ma, Y Yang, H Hu, Q Liu, J Yang, C Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

Offline reinforcement learning (RL) shows promise of applying RL to real-world problems by
effectively utilizing previously collected data. Most existing offline RL algorithms use …

Opslaan Citeren Geciteerd door 47 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline reinforcement learning with soft behavior regularization

H Xu, X Zhan, J Li, H Yin - arxiv preprint arxiv:2110.07395, 2021 - arxiv.org

Most prior approaches to offline reinforcement learning (RL) utilize\textit {behavior
regularization}, typically augmenting existing off-policy actor critic algorithms with a penalty …

Opslaan Citeren Geciteerd door 31 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

State-action similarity-based representations for off-policy evaluation

B Pavse, J Hanna - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the
expected return of an evaluation policy given a fixed dataset that was collected by running …

Opslaan Citeren Geciteerd door 6 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Modified DDPG car-following model with a real-world human driving experience with CARLA simulator

D Li, O Okhrin - Transportation research part C: emerging technologies, 2023 - Elsevier

In the autonomous driving field, fusion of human knowledge into Deep Reinforcement
Learning (DRL) is often based on the human demonstration recorded in a simulated …

Opslaan Citeren Geciteerd door 36 Verwante artikelen Alle 9 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Provably efficient offline reinforcement learning with trajectory-wise reward

T Xu, Y Wang, S Zou, Y Liang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The remarkable success of reinforcement learning (RL) heavily relies on observing the
reward of every visited state-action pair. In many real world applications, however, an agent …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 2 versies

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Offline reinforcement learning with pseudometric learning

Is conditional generative modeling all you need for decision-making?

Offline reinforcement learning as one big sequence modeling problem

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

Curriculum reinforcement learning using optimal transport via gradual domain adaptation

Offline reinforcement learning as anti-exploration

Offline reinforcement learning with value-based episodic memory

Offline reinforcement learning with soft behavior regularization

State-action similarity-based representations for off-policy evaluation

Modified DDPG car-following model with a real-world human driving experience with CARLA simulator

Provably efficient offline reinforcement learning with trajectory-wise reward