A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - ar** accurate off-policy estimators is crucial for both evaluating and optimizing for
new policies. The main challenge in off-policy estimation is the distribution shift between the …