A survey on causal reinforcement learning

Y Zeng, R Cai, F Sun, L Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
While reinforcement learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …

On the opportunities and challenges of offline reinforcement learning for recommender systems

X Chen, S Wang, J McAuley, D Jannach… - ACM Transactions on …, 2024 - dl.acm.org
Reinforcement learning serves as a potent tool for modeling dynamic user interests within
recommender systems, garnering increasing research attention of late. However, a …

Provably mitigating overoptimization in rlhf: Your sft loss is implicitly an adversarial regularizer

Z Liu, M Lu, S Zhang, B Liu, H Guo, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Aligning generative models with human preference via RLHF typically suffers from
overoptimization, where an imperfectly learned reward model can misguide the generative …

Structure in deep reinforcement learning: A survey and open problems

A Mohan, A Zhang, M Lindauer - Journal of Artificial Intelligence Research, 2024 - jair.org
Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural
Networks (DNNs) for function approximation, has demonstrated considerable success in …

Provably efficient causal reinforcement learning with confounded observational data

L Wang, Z Yang, Z Wang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Empowered by neural networks, deep reinforcement learning (DRL) achieves tremendous
empirical success. However, DRL requires a large dataset by interacting with the …

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

C Shi, M Uehara, J Huang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …

Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes

A Bennett, N Kallus - Operations Research, 2024 - pubsonline.informs.org
In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - arxiv preprint arxiv:2307.01452, 2023 - arxiv.org
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

Minimax Instrumental Variable Regression and Convergence Guarantees without Identification or Closedness

A Bennett, N Kallus, X Mao, W Newey… - The Thirty Sixth …, 2023 - proceedings.mlr.press
In this paper, we study nonparametric estimation of instrumental variable (IV) regressions.
Recently, many flexible machine learning methods have been developed for instrumental …