- Academic Search

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Save Cite Cited by 69 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] nsf.gov

Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances

Y Saito, T Joachims - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Counterfactual estimators enable the use of existing log data to estimate how some new
target recommendation policy would have performed, if it had been used instead of the …

Save Cite Cited by 55 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

Save Cite Cited by 76 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Improved sample complexity bounds for distributionally robust reinforcement learning

Z Xu, K Panaganti, D Kalathil - International Conference on …, 2023 - proceedings.mlr.press

We consider the problem of learning a control policy that is robust against the parameter
mismatches between the training environment and testing environment. We formulate this as …

Save Cite Cited by 37 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Distributionally Robust -Learning

Z Liu, Q Bai, J Blanchet, P Dong, W Xu… - International …, 2022 - proceedings.mlr.press

Reinforcement learning (RL) has demonstrated remarkable achievements in simulated
environments. However, carrying this success to real environments requires the important …

Save Cite Cited by 51 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Z Zhou, Z Zhou, Q Bai, L Qiu… - International …, 2021 - proceedings.mlr.press

While reinforcement learning has witnessed tremendous success recently in a wide range of
domains, robustness–or the lack thereof–remains an important issue that remains …

Save Cite Cited by 90 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Doubly robust distributionally robust off-policy evaluation and learning

N Kallus, X Mao, K Wang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Off-policy evaluation and learning (OPE/L) use offline observational data to make better
decisions, which is crucial in applications where online experimentation is limited. However …

Save Cite Cited by 42 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

W Yang, L Zhang, Z Zhang - The Annals of Statistics, 2022 - projecteuclid.org

Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …

Save Cite Cited by 69 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] researchgate.net

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

Save Cite Cited by 52 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org

Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

Save Cite Cited by 16 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Distributionally robust policy evaluation and learning in offline contextual bandits

A review of off-policy evaluation in reinforcement learning

Counterfactual learning and evaluation for recommender systems: Foundations, implementations, and recent advances

Policy gradient method for robust reinforcement learning

Improved sample complexity bounds for distributionally robust reinforcement learning

Distributionally Robust -Learning

Finite-sample regret bound for distributionally robust offline tabular reinforcement learning

Doubly robust distributionally robust off-policy evaluation and learning

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

Pessimistic reward models for off-policy learning in recommendation

Pessimistic decision-making for recommender systems