الباحث العلمي من Google

J Chen, A Modi, A Krishnamurthy… - Advances in Neural …, 2022‏ - proceedings.neurips.cc‏

We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …‏

حفظ اقتباس تم اقتباسها في عدد: 34 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Future-dependent value-based off-policy evaluation in pomdps‏

M Uehara, H Kiyohara, A Bennett… - Advances in neural …, 2023‏ - proceedings.neurips.cc‏

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …‏

حفظ اقتباس تم اقتباسها في عدد: 21 مقالات ذات صلة الإصدارات الـ 11كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A primal-dual-critic algorithm for offline constrained reinforcement learning‏

K Hong, Y Li, A Tewari - International Conference on …, 2024‏ - proceedings.mlr.press‏

Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the
expected cumulative reward subject to constraints on expected cumulative cost using an …‏

حفظ اقتباس تم اقتباسها في عدد: 10 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Neural network approximation for pessimistic offline reinforcement learning‏

D Wu, Y Jiao, L Shen, H Yang, X Lu - Proceedings of the AAAI …, 2024‏ - ojs.aaai.org‏

Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-
making scenarios, yet its theoretical guarantees are still under development. Existing works …‏

حفظ اقتباس تم اقتباسها في عدد: 3 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Offline minimax soft-q-learning under realizability and partial coverage‏

M Uehara, N Kallus, JD Lee… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …‏

حفظ اقتباس تم اقتباسها في عدد: 6 مقالات ذات صلة الإصدارات الـ 9كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ompo: A unified framework for rl under policy and dynamics shifts‏

Y Luo, T Ji, F Sun, J Zhang, H Xu, X Zhan - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Training reinforcement learning policies using environment interaction data collected from
varying policies or dynamics presents a fundamental challenge. Existing works often …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A finite-sample analysis of multi-step temporal difference estimates‏

Y Duan, MJ Wainwright - Learning for Dynamics and Control …, 2023‏ - proceedings.mlr.press‏

We consider the problem of estimating the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …‏

حفظ اقتباس تم اقتباسها في عدد: 3 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification‏

Y Duan, MJ Wainwright - arxiv preprint arxiv:2211.03899, 2022‏ - arxiv.org‏

We study non-parametric estimation of the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP) using observations from a single trajectory. We …‏

حفظ اقتباس تم اقتباسها في عدد: 4 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline Learning for Combinatorial Multi-armed Bandits‏

X Liu, X Dai, J Zuo, S Wang, CJ Wong, J Lui… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making
framework, extensively studied over the past decade. However, existing work primarily …‏

حفظ اقتباس مقالات ذات صلة إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] illinois.edu

Reinforcement learning under general function approximation and novel interaction settings‏

J Chen - 2023‏ - ideals.illinois.edu‏

Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …‏

حفظ اقتباس مقالات ذات صلة إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Bellman residual orthogonalization for offline reinforcement learning

On the statistical efficiency of reward-free exploration in non-linear rl‏

Future-dependent value-based off-policy evaluation in pomdps‏

A primal-dual-critic algorithm for offline constrained reinforcement learning‏

Neural network approximation for pessimistic offline reinforcement learning‏

Offline minimax soft-q-learning under realizability and partial coverage‏

Ompo: A unified framework for rl under policy and dynamics shifts‏

A finite-sample analysis of multi-step temporal difference estimates‏

Policy evaluation from a single path: Multi-step methods, mixing and mis-specification‏

Offline Learning for Combinatorial Multi-armed Bandits‏

Reinforcement learning under general function approximation and novel interaction settings‏