Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …
sequential decision-making strategies, has gained surging prominence in recent studies …
Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …
collected dataset without further interactions with the environment. While various algorithms …
Learn to match with no regret: Reinforcement learning in markov matching markets
We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …
two sides of the market. At each step, the agents are presented with a dynamical context …
Posterior sampling with delayed feedback for reinforcement learning with linear function approximation
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …
function approximation to alleviate the sample complexity hurdle for better performance …
Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
Noise-adaptive thompson sampling for linear contextual bandits
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …
world applications, and it is critical to develop algorithms that can effectively manage noise …
Sample complexity of offline distributionally robust linear markov decision processes
In offline reinforcement learning (RL), the absence of active exploration calls for attention on
the model robustness to tackle the sim-to-real gap, where the discrepancy between the …
the model robustness to tackle the sim-to-real gap, where the discrepancy between the …
Provable benefit of multitask representation learning in reinforcement learning
As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …
Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …
processes, where many agents cooperate via communication through a central server. We …
Minimax optimal and computationally efficient algorithms for distributionally robust offline reinforcement learning
Distributionally robust offline reinforcement learning (RL), which seeks robust policy training
against environment perturbation by modeling dynamics uncertainty, calls for function …
against environment perturbation by modeling dynamics uncertainty, calls for function …