Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Learn to match with no regret: Reinforcement learning in markov matching markets
We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …
two sides of the market. At each step, the agents are presented with a dynamical context …
Comprehensive transformer-based model architecture for real-world storm prediction
Storm prediction provides the early alert for preparation, avoiding potential damage to
property and human safety. However, a traditional storm prediction model usually incurs …
property and human safety. However, a traditional storm prediction model usually incurs …
Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
Noise-adaptive thompson sampling for linear contextual bandits
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …
world applications, and it is critical to develop algorithms that can effectively manage noise …
Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …
processes, where many agents cooperate via communication through a central server. We …
Variance-aware off-policy evaluation with linear function approximation
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …
function approximation, which aims to estimate the value function of a target policy based on …
Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback
In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …
information feedback setting. In particular, the unknown transition probability kernel admits a …
Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning
In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …
Augment online linear optimization with arbitrarily bad machine-learned predictions
The online linear optimization paradigm is important to many real-world network
applications as well as theoretical algorithmic studies. Recent studies have made attempts …
applications as well as theoretical algorithmic studies. Recent studies have made attempts …
Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition
We study reinforcement learning (RL) with linear function approximation, unknown
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …