Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Nearly minimax optimal reinforcement learning for linear mixture markov decision processes
We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
underlying transition probability kernel of the Markov decision process (MDP) is a linear …
Provably efficient exploration in policy optimization
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …
it is significantly less understood in theory, especially compared with value-based RL. In …
Nearly minimax optimal reinforcement learning for linear markov decision processes
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
Unpacking reward sha**: Understanding the benefits of reward engineering on sample complexity
The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …
making problems has been much discussed, but often ignored in this discussion is the …
Human-in-the-loop: Provably efficient preference-based reinforcement learning with general function approximation
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
instead of receiving a numeric reward at each step, the RL agent only receives preferences …
Provably efficient safe exploration via primal-dual policy optimization
We study the safe reinforcement learning problem using the constrained Markov decision
processes in which an agent aims to maximize the expected total reward subject to a safety …
processes in which an agent aims to maximize the expected total reward subject to a safety …
Leveraging offline data in online reinforcement learning
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
Reinforcement learning with general value function approximation: Provably efficient approach via bounded eluder dimension
Value function approximation has demonstrated phenomenal empirical success in
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …
reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …
A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes
The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …
Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency
Large language models (LLMs) demonstrate impressive reasoning abilities, but translating
reasoning into actions in the real world remains challenging. In particular, it remains unclear …
reasoning into actions in the real world remains challenging. In particular, it remains unclear …