Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

Exponential smoothing for off-policy learning

I Aouali, VE Brunel, D Rohde… - … Conference on Machine …, 2023 - proceedings.mlr.press
Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by
minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we …

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org
How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

Adaptive instrument design for indirect experiments

Y Chandak, S Shankar, V Syrgkanis… - The Twelfth International …, 2023 - openreview.net
Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …

On the relation between policy improvement and off-policy minimum-variance policy evaluation

AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …

Goal-conditioned generators of deep policies

F Faccio, V Herrmann, A Ramesh, L Kirsch… - Proceedings of the …, 2023 - ojs.aaai.org
Abstract Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies,
given goals encoded in special command inputs. Here we study goal-conditioned neural …

Lifelong hyper-policy optimization with multiple importance sampling regularization

P Liotet, F Vidaich, AM Metelli, M Restelli - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for
current reinforcement learning algorithms. Yet this would be a much needed feature for …

IWDA: Importance weighting for drift adaptation in streaming supervised learning problems

F Fedeli, AM Metelli, F Trovò… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Distribution drift is an important issue for practical applications of machine learning (ML). In
particular, in streaming ML, the data distribution may change over time, yielding the problem …

Delays in reinforcement learning

P Liotet - arxiv preprint arxiv:2309.11096, 2023 - arxiv.org
Delays are inherent to most dynamical systems. Besides shifting the process in time, they
can significantly affect their performance. For this reason, it is usually valuable to study the …