Greedification operators for policy optimization: Investigating forward and reverse kl divergences

A Chan, H Silva, S Lim, T Kozuno, AR Mahmood… - Journal of Machine …, 2022 - jmlr.org
Approximate Policy Iteration (API) algorithms alternate between (approximate) policy
evaluation and (approximate) greedification. Many different approaches have been explored …

QC_SANE: Robust control in DRL using quantile critic with spiking actor and normalized ensemble

S Gupta, G Singal, D Garg… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recently introduced deep reinforcement learning (DRL) techniques in discrete-time have
resulted in significant advances in online games, robotics, and so on. Inspired from recent …