- Academic Search

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022‏ - proceedings.neurips.cc‏

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …‏

שמור צטט צוטט על ידי 70 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model‏

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2023‏ - proceedings.neurips.cc‏

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …‏

שמור צטט צוטט על ידי 42 מאמרים בנושא זה כל 12 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning‏

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024‏ - projecteuclid.org‏

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …‏

שמור צטט צוטט על ידי 96 מאמרים בנושא זה כל 10 הגרסאות

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient safe exploration via primal-dual policy optimization‏

D Ding, X Wei, Z Yang, Z Wang… - … conference on artificial …, 2021‏ - proceedings.mlr.press‏

We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …‏

שמור צטט צוטט על ידי 189 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Almost optimal model-free reinforcement learningvia reference-advantage decomposition‏

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020‏ - proceedings.neurips.cc‏

We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …‏

שמור צטט צוטט על ידי 181 מאמרים בנושא זה כל 8 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deployment-efficient reinforcement learning via model-based offline optimization‏

T Matsushima, H Furuta, Y Matsuo, O Nachum… - arxiv preprint arxiv …, 2020‏ - arxiv.org‏

Most reinforcement learning (RL) algorithms assume online access to the environment, in
which one may readily interleave updates to the policy with experience collection using that …‏

שמור צטט צוטט על ידי 168 מאמרים בנושא זה כל 9 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism‏

M Yin, YX Wang - Advances in neural information …, 2021‏ - proceedings.neurips.cc‏

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …‏

שמור צטט צוטט על ידי 97 מאמרים בנושא זה כל 7 הגרסאות פתיחה בתור HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Understanding domain randomization for sim-to-real transfer‏

X Chen, J Hu, C **, L Li, L Wang - arxiv preprint arxiv:2110.03239, 2021‏ - arxiv.org‏

Reinforcement learning encounters many challenges when applied directly in the real world.
Sim-to-real transfer is widely used to transfer the knowledge learned from simulation to the …‏

שמור צטט צוטט על ידי 102 מאמרים בנושא זה כל 4 הגרסאות פתיחה בתור HTML

יצירת התראה

צטט

חיפוש מתקדם

נשמר בספרייה שלי

Provably efficient q-learning with low switching cost

Policy finetuning: Bridging sample-efficient offline and online reinforcement learning‏

The curious price of distributional robustness in reinforcement learning with a generative model‏

Settling the sample complexity of model-based offline reinforcement learning‏

Provably efficient safe exploration via primal-dual policy optimization‏

Almost optimal model-free reinforcement learningvia reference-advantage decomposition‏

Deployment-efficient reinforcement learning via model-based offline optimization‏

Towards instance-optimal offline reinforcement learning with pessimism‏

Understanding domain randomization for sim-to-real transfer‏