Multiobjective lipschitz bandits under lexicographic ordering

B Xue, J Cheng, F Liu, Y Wang, Q Zhang - Proceedings of the AAAI …, 2024 - ojs.aaai.org
This paper studies the multiobjective bandit problem under lexicographic ordering, wherein
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …

Cooperative learning for adversarial multi-armed bandit on open multi-agent systems

T Nakamura, N Hayashi… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org
This letter considers a cooperative decision-making method for an adversarial bandit
problem on open multi-agent systems. In an open multi-agent system, the network …

Online convex optimization with unbounded memory

R Kumar, S Dean, R Kleinberg - Advances in Neural …, 2023 - proceedings.neurips.cc
Online convex optimization (OCO) is a widely used framework in online learning. In each
round, the learner chooses a decision in a convex set and an adversary chooses a convex …

A Unified Regularization Approach to High-Dimensional Generalized Tensor Bandits

J Li, Y Yang, Y Wang, S Tang - arxiv preprint arxiv:2501.10722, 2025 - arxiv.org
Modern decision-making scenarios often involve data that is both high-dimensional and rich
in higher-order contextual information, where existing bandits algorithms fail to generate …

An Adaptive Method for Non-Stationary Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System

J Gornet, M Hosseinzadeh, B Sinopoli - arxiv preprint arxiv:2406.10418, 2024 - arxiv.org
Online decision-making can be formulated as the popular stochastic multi-armed bandit
problem where a learner makes decisions (or takes actions) to maximize cumulative …

Learning From Interactions via Online Decision-Making and Network Science

R Kumar - 2024 - search.proquest.com
Interactions between a learner and an environment arise in a variety of domains, ranging
from online recommendations (eg, Spotify) to control of physical dynamical systems (eg …