Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
On the statistical efficiency of reward-free exploration in non-linear rl
We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …
approximation, and establish sample efficiency and hardness results under various standard …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
A primal-dual-critic algorithm for offline constrained reinforcement learning
Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the
expected cumulative reward subject to constraints on expected cumulative cost using an …
expected cumulative reward subject to constraints on expected cumulative cost using an …
Neural network approximation for pessimistic offline reinforcement learning
Deep reinforcement learning (RL) has shown remarkable success in specific offline decision-
making scenarios, yet its theoretical guarantees are still under development. Existing works …
making scenarios, yet its theoretical guarantees are still under development. Existing works …
Offline minimax soft-q-learning under realizability and partial coverage
We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …
Ompo: A unified framework for rl under policy and dynamics shifts
Training reinforcement learning policies using environment interaction data collected from
varying policies or dynamics presents a fundamental challenge. Existing works often …
varying policies or dynamics presents a fundamental challenge. Existing works often …
A finite-sample analysis of multi-step temporal difference estimates
We consider the problem of estimating the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …
Policy evaluation from a single path: Multi-step methods, mixing and mis-specification
We study non-parametric estimation of the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP) using observations from a single trajectory. We …
discounted Markov reward process (MRP) using observations from a single trajectory. We …
Offline Learning for Combinatorial Multi-armed Bandits
The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making
framework, extensively studied over the past decade. However, existing work primarily …
framework, extensively studied over the past decade. However, existing work primarily …
Reinforcement learning under general function approximation and novel interaction settings
J Chen - 2023 - ideals.illinois.edu
Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …
solves sequential decision-making problems based on experience. Recent advances in the …