Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Subgaussian and differentiable importance sampling for off-policy evaluation and learning
Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …
estimation and learning algorithms. However, empirical and theoretical studies have …
Exponential smoothing for off-policy learning
Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by
minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we …
minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we …
Importance sampling techniques for policy optimization
How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …
Adaptive instrument design for indirect experiments
Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …
situations where conducting randomized control trials (RCTs) is impractical or unethical …
On the relation between policy improvement and off-policy minimum-variance policy evaluation
Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …
Goal-conditioned generators of deep policies
Abstract Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies,
given goals encoded in special command inputs. Here we study goal-conditioned neural …
given goals encoded in special command inputs. Here we study goal-conditioned neural …
Lifelong hyper-policy optimization with multiple importance sampling regularization
Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for
current reinforcement learning algorithms. Yet this would be a much needed feature for …
current reinforcement learning algorithms. Yet this would be a much needed feature for …
IWDA: Importance weighting for drift adaptation in streaming supervised learning problems
Distribution drift is an important issue for practical applications of machine learning (ML). In
particular, in streaming ML, the data distribution may change over time, yielding the problem …
particular, in streaming ML, the data distribution may change over time, yielding the problem …
Delays in reinforcement learning
P Liotet - arxiv preprint arxiv:2309.11096, 2023 - arxiv.org
Delays are inherent to most dynamical systems. Besides shifting the process in time, they
can significantly affect their performance. For this reason, it is usually valuable to study the …
can significantly affect their performance. For this reason, it is usually valuable to study the …