Stochastic rising bandits

AM Metelli, F Trovo, M Pirola… - … Conference on Machine …, 2022 - proceedings.mlr.press
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie, those sequential
selection techniques able to learn online using only the feedback given by the chosen …

Which LLM to Play? Convergence-Aware Online Model Selection with Time-Increasing Bandits

Y **a, F Kong, T Yu, L Guo, RA Rossi, S Kim… - Proceedings of the ACM …, 2024 - dl.acm.org
Web-based applications such as chatbots, search engines and news recommendations
continue to grow in scale and complexity with the recent surge in the adoption of large …

Model-Based Best Arm Identification for Decreasing Bandits

S Takemori, Y Umeda… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We study the problem of reliably identifying the best (lowest loss) arm in a stochastic multi-
armed bandit when the expected loss of each arm is monotone decreasing as a function of …

Best Arm Identification for Stochastic Rising Bandits

M Mussi, A Montenegro, F Trovó, M Restelli… - arxiv preprint arxiv …, 2023 - arxiv.org
Stochastic Rising Bandits (SRBs) model sequential decision-making problems in which the
expected rewards of the available options increase every time they are selected. This setting …

Budgeted Online Model Selection and Fine-Tuning via Federated Learning

PM Ghari, Y Shen - arxiv preprint arxiv:2401.10478, 2024 - arxiv.org
Online model selection involves selecting a model from a set of candidate models' on the
fly'to perform prediction on a stream of data. The choice of candidate models henceforth has …

Rising Rested Bandits: Lower Bounds and Efficient Algorithms

M Fiandri, AM Metelli, F Trovo - arxiv preprint arxiv:2411.14446, 2024 - arxiv.org
This paper is in the field of stochastic Multi-Armed Bandits (MABs), ie those sequential
selection techniques able to learn online using only the feedback given by the chosen …

Rising Rested MAB with Linear Drift

O Amichay, Y Mansour - arxiv preprint arxiv:2501.04403, 2025 - arxiv.org
We consider non-stationary multi-arm bandit (MAB) where the expected reward of each
action follows a linear function of the number of times we executed the action. Our main …

Convergence-Aware Online Model Selection with Time-Increasing Bandits

Y **a, F Kong, T Yu, L Guo, RA Rossi, S Kim… - The Web Conference … - openreview.net
Web-based applications such as chatbots, search engines and news recommendations
continue to grow in scale and complexity with the recent surge in the adoption of large …

Scalable Online Decision Making: Algorithm Design and Fundamental Limits

PM Ghari - 2024 - search.proquest.com
Decision-making and real-time prediction in non-stationary and dynamic environments
present significant challenges for the application of machine learning and artificial …

Scalable Online Decision Making: Algorithm Design and Fundamental Limits

P Mollaebrahim Ghari - 2024 - escholarship.org
Decision-making and real-time prediction in non-stationary and dynamic environments
present significant challenges for the application of machine learning and artificial …