Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Nearly minimax optimal reinforcement learning for linear markov decision processes
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
VOL: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation
We study time-inhomogeneous episodic reinforcement learning (RL) under general function
approximation and sparse rewards. We design a new algorithm, Variance-weighted …
approximation and sparse rewards. We design a new algorithm, Variance-weighted …
Corruption-robust offline reinforcement learning with general function approximation
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …
with general function approximation, where an adversary can corrupt each sample in the …
Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency
Recently, several studies\citep {zhou2021nearly, zhang2021variance, kim2021improved,
zhou2022computationally} have provided variance-dependent regret bounds for linear …
zhou2022computationally} have provided variance-dependent regret bounds for linear …
Reinforcement learning from human feedback with active queries
Aligning large language models (LLM) with human preference plays a key role in building
modern generative models and can be achieved by reinforcement learning from human …
modern generative models and can be achieved by reinforcement learning from human …
Noise-adaptive thompson sampling for linear contextual bandits
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …
world applications, and it is critical to develop algorithms that can effectively manage noise …
Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …
processes, where many agents cooperate via communication through a central server. We …
A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation
The exploration-exploitation dilemma has been a central challenge in reinforcement
learning (RL) with complex model classes. In this paper, we propose a new algorithm …
learning (RL) with complex model classes. In this paper, we propose a new algorithm …
Tackling heavy-tailed rewards in reinforcement learning with function approximation: Minimax optimal and instance-dependent regret bounds
While numerous works have focused on devising efficient algorithms for reinforcement
learning (RL) with uniformly bounded rewards, it remains an open question whether sample …
learning (RL) with uniformly bounded rewards, it remains an open question whether sample …