Off-policy actor-critic with emphatic weightings
A variety of theoretically-sound policy gradient algorithms exist for the on-policy setting due
to the policy gradient theorem, which provides a simplified form for the gradient. The off …
to the policy gradient theorem, which provides a simplified form for the gradient. The off …
Curious Explorer: a provable exploration strategy in Policy Learning
A coverage assumption is critical with policy gradient methods, because while the objective
function is insensitive to updates in unlikely states, the agent may need improvements in …
function is insensitive to updates in unlikely states, the agent may need improvements in …