Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence

W Zhan, S Cen, B Huang, Y Chen, JD Lee… - SIAM Journal on …, 2023 - SIAM
Policy optimization, which learns the policy of interest by maximizing the value function via
large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL) …

Comprehensive analysis of artificial intelligence techniques for gynaecological cancer: symptoms identification, prognosis and prediction

S Gandotra, Y Kumar, N Modi, J Choi, J Shafi… - Artificial Intelligence …, 2024 - Springer
Gynaecological cancers encompass a spectrum of malignancies affecting the female
reproductive system, comprising the cervix, uterus, ovaries, vulva, vagina, and fallopian …

A general framework of motion planning for redundant robot manipulator based on deep reinforcement learning

X Li, H Liu, M Dong - IEEE Transactions on Industrial …, 2021 - ieeexplore.ieee.org
Motion planning and its optimization is vital and difficult for redundant robot manipulator in
an environment with obstacles. In this article, a general motion planning framework that …

Soft robots learn to crawl: Jointly optimizing design and control with sim-to-real transfer

C Schaff, A Sedal, MR Walter - arxiv preprint arxiv:2202.04575, 2022 - arxiv.org
This work provides a complete framework for the simulation, co-optimization, and sim-to-real
transfer of the design and control of soft legged robots. The compliance of soft robots …

General munchausen reinforcement learning with tsallis kullback-leibler divergence

L Zhu, Z Chen, M Schlegel… - Advances in Neural …, 2024 - proceedings.neurips.cc
Many policy optimization approaches in reinforcement learning incorporate a Kullback-
Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too …

ATAC-based car-following model for level 3 autonomous driving considering driver's acceptance

TQ Tang, Y Gui, J Zhang - IEEE transactions on intelligent …, 2021 - ieeexplore.ieee.org
To date, commercial fully autonomous driving is not realized, while level 3 is the next step in
the development of autonomous driving. At level 3, the vehicle is driving under the control of …

Policy mirror descent inherently explores action space

Y Li, G Lan - SIAM Journal on Optimization, 2025 - SIAM
Explicit exploration in the action space was assumed to be indispensable for online policy
gradient methods to avoid a drastic degradation in sample complexity, for solving general …

[PDF][PDF] Optimal algorithms for stochastic multi-armed bandits with heavy tailed rewards

K Lee, H Yang, S Lim, S Oh - Advances in Neural …, 2020 - proceedings.neurips.cc
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Page 1
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Kyungjae Lee …

Tsallis and Rényi deformations linked via a new λ-duality

TKL Wong, J Zhang - IEEE Transactions on Information Theory, 2022 - ieeexplore.ieee.org
Tsallis and Rényi entropies, which are monotone transformations of each other, are
deformations of the celebrated Shannon entropy. Maximization of these deformed entropies …

Sim-to-real transfer of co-optimized soft robot crawlers

C Schaff, A Sedal, S Ni, MR Walter - Autonomous Robots, 2023 - Springer
This work provides a complete framework for the simulation, co-optimization, and sim-to-real
transfer of the design and control of soft legged robots. Soft robots have “mechanical …