الباحث العلمي من Google

Y Liu, K Zhang, T Basar, W Yin - Advances in Neural …, 2020‏ - proceedings.neurips.cc‏

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …‏

حفظ اقتباس تم اقتباسها في عدد: 123 مقالات ذات صلة الإصدارات الـ 8كلها إصدار HTML‏

[Free GPT-4]

[PDF] aaai.org

Sample efficient reinforcement learning with REINFORCE‏

J Zhang, J Kim, B O'Donoghue, S Boyd - Proceedings of the AAAI …, 2021‏ - ojs.aaai.org‏

Policy gradient methods are among the most effective methods for large-scale reinforcement
learning, and their empirical success has prompted several works that develop the …‏

حفظ اقتباس تم اقتباسها في عدد: 120 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

[PDF][PDF] Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …‏

WU Mondal, V Aggarwal - International Conference on …, 2024‏ - proceedings.mlr.press‏

We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …‏

حفظ اقتباس تم اقتباسها في عدد: 17 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

Momentum-based policy gradient methods‏

F Huang, S Gao, J Pei, H Huang - … conference on machine …, 2020‏ - proceedings.mlr.press‏

In the paper, we propose a class of efficient momentum-based policy gradient methods for
the model-free reinforcement learning, which use adaptive learning rates and do not require …‏

حفظ اقتباس تم اقتباسها في عدد: 54 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

On the hidden biases of policy mirror ascent in continuous action spaces‏

AS Bedi, S Chakraborty, A Parayil… - International …, 2022‏ - proceedings.mlr.press‏

We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …‏

حفظ اقتباس تم اقتباسها في عدد: 20 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

PAGE-PG: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation‏

M Gargiani, A Zanelli, A Martinelli… - International …, 2022‏ - proceedings.mlr.press‏

Despite their success, policy gradient methods suffer from high variance of the gradient
estimator, which can result in unsatisfactory sample complexity. Recently, numerous …‏

حفظ اقتباس تم اقتباسها في عدد: 18 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]

[PDF] mlr.press

Efficient privacy-preserving stochastic nonconvex optimization‏

L Wang, B Jayaraman, D Evans… - Uncertainty in Artificial …, 2023‏ - proceedings.mlr.press‏

While many solutions for privacy-preserving convex empirical risk minimization (ERM) have
been developed, privacy-preserving nonconvex ERM remains a challenge. We study …‏

حفظ اقتباس تم اقتباسها في عدد: 33 مقالات ذات صلة الإصدارات الـ 10كلها إصدار HTML‏

[Free GPT-4]

[PDF] springer.com

Smoothing policies and safe policy gradients‏

M Papini, M Pirotta, M Restelli - Machine Learning, 2022‏ - Springer‏

Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …‏

حفظ اقتباس تم اقتباسها في عدد: 43 مقالات ذات صلة الإصدارات الـ 11كلها

[Free GPT-4]

[PDF] ieee.org

Adaptive stochastic ADMM for decentralized reinforcement learning in edge IoT‏

W Lei, Y Ye, M **ao, M Skoglund… - IEEE Internet of Things …, 2022‏ - ieeexplore.ieee.org‏

Edge computing provides a promising paradigm to support the implementation of Internet of
Things (IoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network …‏

حفظ اقتباس تم اقتباسها في عدد: 8 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]

[PDF] arxiv.org

Dealing with sparse rewards in continuous control robotics via heavy-tailed policies‏

S Chakraborty, AS Bedi, A Koppel, P Tokekar… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG)
algorithm to deal with the challenges of sparse rewards in continuous control problems …‏

حفظ اقتباس تم اقتباسها في عدد: 8 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Stochastic recursive momentum for policy gradient methods

An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods‏

Sample efficient reinforcement learning with REINFORCE‏

[PDF][PDF] Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …‏

Momentum-based policy gradient methods‏

On the hidden biases of policy mirror ascent in continuous action spaces‏

PAGE-PG: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation‏

Efficient privacy-preserving stochastic nonconvex optimization‏

Smoothing policies and safe policy gradients‏

Adaptive stochastic ADMM for decentralized reinforcement learning in edge IoT‏

Dealing with sparse rewards in continuous control robotics via heavy-tailed policies‏