- Academic Search

O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …

Uložit Citovat Počet citací tohoto článku: 62 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mildly conservative q-learning for offline reinforcement learning

J Lyu, X Ma, X Li, Z Lu - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a static logged dataset
without continually interacting with the environment. The distribution shift between the …

Uložit Citovat Počet citací tohoto článku: 127 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Rorl: Robust offline reinforcement learning via conservative smoothing

R Yang, C Bai, X Ma, Z Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount
of offline data for complex decision-making tasks. Due to the distribution shift issue, current …

Uložit Citovat Počet citací tohoto článku: 86 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Uložit Citovat Počet citací tohoto článku: 69 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] iastate.edu

Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward

HC Croll, K Ikuma, SK Ong, S Sarkar - Critical Reviews in …, 2023 - Taylor & Francis

Wastewater treatment process control optimization is a complex task in a highly nonlinear
environment. Reinforcement learning (RL) is a machine learning technique that stands out …

Uložit Citovat Počet citací tohoto článku: 22 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Uložit Citovat Počet citací tohoto článku: 19 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforcement learning with human feedback: Learning dynamic choices via pessimism

Z Li, Z Yang, M Wang - arxiv preprint arxiv:2305.18438, 2023 - arxiv.org

In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where
we aim to learn the human's underlying reward and the MDP's optimal policy from a set of …

Uložit Citovat Počet citací tohoto článku: 54 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Model-Bellman inconsistency for model-based offline reinforcement learning

Y Sun, J Zhang, C Jia, H Lin, J Ye… - … Conference on Machine …, 2023 - proceedings.mlr.press

For offline reinforcement learning (RL), model-based methods are expected to be data-
efficient as they incorporate dynamics models to generate more data. However, due to …

Uložit Citovat Počet citací tohoto článku: 34 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Design from policies: Conservative test-time adaptation for offline policy optimization

J Liu, H Zhang, Z Zhuang, Y Kang… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we decouple the iterative bi-level offline RL (value estimation and policy
extraction) from the offline training phase, forming a non-iterative bi-level paradigm and …

Uložit Citovat Počet citací tohoto článku: 10 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

Uložit Citovat Počet citací tohoto článku: 46 Související články Všechny verze (počet: 11) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Pessimistic bootstrap** for uncertainty-driven offline reinforcement learning

A review of uncertainty for deep reinforcement learning

Mildly conservative q-learning for offline reinforcement learning

Rorl: Robust offline reinforcement learning via conservative smoothing

A policy-guided imitation approach for offline reinforcement learning

Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward

Corruption-robust offline reinforcement learning with general function approximation

Reinforcement learning with human feedback: Learning dynamic choices via pessimism

Model-Bellman inconsistency for model-based offline reinforcement learning

Design from policies: Conservative test-time adaptation for offline policy optimization

Vrl3: A data-driven framework for visual deep reinforcement learning