- Academic Search

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

保存引用被引用数: 57 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

AM Metelli, A Russo, M Restelli - Advances in neural …, 2021 - proceedings.neurips.cc

Importance Sampling (IS) is a widely used building block for a large variety of off-policy
estimation and learning algorithms. However, empirical and theoretical studies have …

保存引用被引用数: 38 関連記事全 12 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Exponential smoothing for off-policy learning

I Aouali, VE Brunel, D Rohde… - … Conference on Machine …, 2023 - proceedings.mlr.press

Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by
minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we …

保存引用被引用数: 13 関連記事全 13 バージョン HTMLバージョン

[Free GPT-4]

[PDF] jmlr.org

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org

How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

保存引用被引用数: 61 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] openreview.net

Adaptive instrument design for indirect experiments

Y Chandak, S Shankar, V Syrgkanis… - The Twelfth International …, 2023 - openreview.net

Indirect experiments provide a valuable framework for estimating treatment effects in
situations where conducting randomized control trials (RCTs) is impractical or unethical …

保存引用被引用数: 7 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

On the relation between policy improvement and off-policy minimum-variance policy evaluation

AM Metelli, S Meta, M Restelli - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Off-policy methods are the basis of a large number of effective Policy Optimization (PO)
algorithms. In this setting, Importance Sampling (IS) is typically employed for off-policy …

保存引用被引用数: 2 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Goal-conditioned generators of deep policies

F Faccio, V Herrmann, A Ramesh, L Kirsch… - Proceedings of the …, 2023 - ojs.aaai.org

Abstract Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies,
given goals encoded in special command inputs. Here we study goal-conditioned neural …

保存引用被引用数: 14 関連記事全 10 バージョン HTMLバージョン

[Free GPT-4]

[PDF] aaai.org

Lifelong hyper-policy optimization with multiple importance sampling regularization

P Liotet, F Vidaich, AM Metelli, M Restelli - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for
current reinforcement learning algorithms. Yet this would be a much needed feature for …

保存引用被引用数: 11 関連記事全 8 バージョン HTMLバージョン

IWDA: Importance weighting for drift adaptation in streaming supervised learning problems

F Fedeli, AM Metelli, F Trovò… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Distribution drift is an important issue for practical applications of machine learning (ML). In
particular, in streaming ML, the data distribution may change over time, yielding the problem …

保存引用被引用数: 7 関連記事全 4 バージョン

[Free GPT-4]

[PDF] arxiv.org

Delays in reinforcement learning

P Liotet - arxiv preprint arxiv:2309.11096, 2023 - arxiv.org

Delays are inherent to most dynamical systems. Besides shifting the process in time, they
can significantly affect their performance. For this reason, it is usually valuable to study the …

保存引用被引用数: 4 関連記事全 4 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Optimistic policy optimization via multiple importance sampling

Universal off-policy evaluation

Subgaussian and differentiable importance sampling for off-policy evaluation and learning

Exponential smoothing for off-policy learning

Importance sampling techniques for policy optimization

Adaptive instrument design for indirect experiments

On the relation between policy improvement and off-policy minimum-variance policy evaluation

Goal-conditioned generators of deep policies

Lifelong hyper-policy optimization with multiple importance sampling regularization

IWDA: Importance weighting for drift adaptation in streaming supervised learning problems

Delays in reinforcement learning