Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Advances of machine learning in materials science: Ideas and techniques
In this big data era, the use of large dataset in conjunction with machine learning (ML) has
been increasingly popular in both industry and academia. In recent times, the field of …
been increasingly popular in both industry and academia. In recent times, the field of …
Efficient and targeted COVID-19 border testing via reinforcement learning
Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
Federated linear contextual bandits
This paper presents a novel federated linear contextual bandits model, where individual
clients face different $ K $-armed stochastic bandits coupled through common global …
clients face different $ K $-armed stochastic bandits coupled through common global …
Feedback efficient online fine-tuning of diffusion models
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …
proteins, and small molecules. However, in many cases, our goal is to model parts of the …
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
The sample complexity of online contract design
We study the hidden-action principal-agent problem in an online setting. In each round, the
principal posts a contract that specifies the payment to the agent based on each outcome …
principal posts a contract that specifies the payment to the agent based on each outcome …
Multi-armed bandit experimental design: Online decision-making and adaptive inference
D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …
Provably efficient q-learning with low switching cost
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …
algorithms that change its exploration policy as infrequently as possible during regret …
Inference for batched bandits
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …
there is an associated increasing need for reliable inference methods based on the resulting …