Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Frontiers in service science: Data-driven revenue management: The interplay of data, model, and decisions
Revenue management (RM) is the application of analytical methodologies and tools that
predict consumer behavior and optimize product availability and prices to maximize a firm's …
predict consumer behavior and optimize product availability and prices to maximize a firm's …
Linear bandits with limited adaptivity and learning distributional optimal design
Motivated by practical needs such as large-scale learning, we study the impact of adaptivity
constraints to linear contextual bandits, a central problem in online learning and decision …
constraints to linear contextual bandits, a central problem in online learning and decision …
Near-optimal regret bounds for multi-batch reinforcement learning
In this paper, we study the episodic reinforcement learning (RL) problem modeled by finite-
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
horizon Markov Decision Processes (MDPs) with constraint on the number of batches. The …
Towards scalable and robust structured bandits: A meta-learning framework
Online learning in large-scale structured bandits is known to be challenging due to the curse
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …
of dimensionality. In this paper, we propose a unified meta-learning framework for a wide …
Phase transitions and cyclic phenomena in bandits with switching constraints
We consider the classical stochastic multi-armed bandit problem with a constraint on the
total cost incurred by switching between actions. Under the unit switching cost structure …
total cost incurred by switching between actions. Under the unit switching cost structure …
Conservative exploration in reinforcement learning
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …
exploration to discover new information about the MDP, and exploitation of the current …
Ucb-based algorithms for multinomial logistic regression bandits
Out of the rich family of generalized linear bandits, perhaps the most well studied ones are
logistic bandits that are used in problems with binary rewards: for instance, when the learner …
logistic bandits that are used in problems with binary rewards: for instance, when the learner …
Online convex optimization with continuous switching constraint
In many sequential decision making applications, the change of decision would bring an
additional cost, such as the wear-and-tear cost associated with changing server status. To …
additional cost, such as the wear-and-tear cost associated with changing server status. To …
Contextual multinomial logit bandits with general value functions
Contextual multinomial logit (MNL) bandits capture many real-world assortment
recommendation problems such as online retailing/advertising. However, prior work has …
recommendation problems such as online retailing/advertising. However, prior work has …
Reinforcement learning with logarithmic regret and policy switches
In this paper, we study the problem of regret minimization for episodic Reinforcement
Learning (RL) both in the model-free and the model-based setting. We focus on learning …
Learning (RL) both in the model-free and the model-based setting. We focus on learning …