Off-policy actor-critic with emphatic weightings

E Graves, E Imani, R Kumaraswamy, M White - Journal of Machine …, 2023 - jmlr.org
A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due
to the policy gradient theorem, which provides a simplified form for the gradient. The off …

Curious Explorer: a provable exploration strategy in Policy Learning

M Miani, M Parton, M Romito - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
A coverage assumption is critical with policy gradient methods, because while the objective
function is insensitive to updates in unlikely states, the agent may need improvements in …