Stochastic rising bandits
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …
selection techniques able to learn online using only the feedback given by the chosen …
Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits
Web-based applications such as chatbots, search engines and news recommendations
continue to grow in scale and complexity with the recent surge in the adoption of large …
continue to grow in scale and complexity with the recent surge in the adoption of large …
Model-Based Best Arm Identification for Decreasing Bandits
S Takemori, Y Umeda… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-
armed bandit when the expected loss of each arm is monotone decreasing as a function of …
armed bandit when the expected loss of each arm is monotone decreasing as a function of …
Best Arm Identification for Stochastic Rising Bandits
Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the
expected rewards of the available options increase every time they are selected. This setting …
expected rewards of the available options increase every time they are selected. This setting …
Budgeted Online Model Selection and Fine-Tuning via Federated Learning
Online model selection involves selecting a model from a set of candidate models' on the
fly'to perform prediction on a stream of data. The choice of candidate models henceforth has …
fly'to perform prediction on a stream of data. The choice of candidate models henceforth has …
Rising Rested Bandits: Lower Bounds and Efficient Algorithms
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie those sequential
selection techniques able to learn online using only the feedback given by the chosen …
selection techniques able to learn online using only the feedback given by the chosen …
Rising Rested MAB with Linear Drift
O Amichay, Y Mansour - arxiv preprint arxiv:2501.04403, 2025 - arxiv.org
We consider non-stationary multi-arm bandit (MAB) where the expected reward of each
action follows a linear function of the number of times we executed the action. Our main …
action follows a linear function of the number of times we executed the action. Our main …
Convergence-Aware Online Model Selection with Time-Increasing Bandits
Web-based applications such as chatbots, search engines and news recommendations
continue to grow in scale and complexity with the recent surge in the adoption of large …
continue to grow in scale and complexity with the recent surge in the adoption of large …
Scalable Online Decision Making: Algorithm Design and Fundamental Limits
PM Ghari - 2024 - search.proquest.com
Decision-making and real-time prediction in non-stationary and dynamic environments
present significant challenges for the application of machine learning and artificial …
present significant challenges for the application of machine learning and artificial …
Scalable Online Decision Making: Algorithm Design and Fundamental Limits
P Mollaebrahim Ghari - 2024 - escholarship.org
Decision-making and real-time prediction in non-stationary and dynamic environments
present significant challenges for the application of machine learning and artificial …
present significant challenges for the application of machine learning and artificial …