Policy mirror descent for regularized reinforcement learning: A generalized framework with linear convergence
Policy optimization, which learns the policy of interest by maximizing the value function via
large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL) …
large-scale optimization techniques, lies at the heart of modern reinforcement learning (RL) …
Comprehensive analysis of artificial intelligence techniques for gynaecological cancer: symptoms identification, prognosis and prediction
Gynaecological cancers encompass a spectrum of malignancies affecting the female
reproductive system, comprising the cervix, uterus, ovaries, vulva, vagina, and fallopian …
reproductive system, comprising the cervix, uterus, ovaries, vulva, vagina, and fallopian …
A general framework of motion planning for redundant robot manipulator based on deep reinforcement learning
Motion planning and its optimization is vital and difficult for redundant robot manipulator in
an environment with obstacles. In this article, a general motion planning framework that …
an environment with obstacles. In this article, a general motion planning framework that …
Soft robots learn to crawl: Jointly optimizing design and control with sim-to-real transfer
This work provides a complete framework for the simulation, co-optimization, and sim-to-real
transfer of the design and control of soft legged robots. The compliance of soft robots …
transfer of the design and control of soft legged robots. The compliance of soft robots …
General munchausen reinforcement learning with tsallis kullback-leibler divergence
Many policy optimization approaches in reinforcement learning incorporate a Kullback-
Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too …
Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too …
ATAC-based car-following model for level 3 autonomous driving considering driver's acceptance
To date, commercial fully autonomous driving is not realized, while level 3 is the next step in
the development of autonomous driving. At level 3, the vehicle is driving under the control of …
the development of autonomous driving. At level 3, the vehicle is driving under the control of …
Policy mirror descent inherently explores action space
Explicit exploration in the action space was assumed to be indispensable for online policy
gradient methods to avoid a drastic degradation in sample complexity, for solving general …
gradient methods to avoid a drastic degradation in sample complexity, for solving general …
[PDF][PDF] Optimal algorithms for stochastic multi-armed bandits with heavy tailed rewards
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Page 1
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Kyungjae Lee …
Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards Kyungjae Lee …
Tsallis and Rényi deformations linked via a new λ-duality
Tsallis and Rényi entropies, which are monotone transformations of each other, are
deformations of the celebrated Shannon entropy. Maximization of these deformed entropies …
deformations of the celebrated Shannon entropy. Maximization of these deformed entropies …
Sim-to-real transfer of co-optimized soft robot crawlers
This work provides a complete framework for the simulation, co-optimization, and sim-to-real
transfer of the design and control of soft legged robots. Soft robots have “mechanical …
transfer of the design and control of soft legged robots. Soft robots have “mechanical …