Google Učenjak

B Xue, J Cheng, F Liu, Y Wang, Q Zhang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …

Shrani Navedi Navedeno v 2 virih Sorodni članki Vse različice: 6 V obliki HTML

Cooperative learning for adversarial multi-armed bandit on open multi-agent systems

T Nakamura, N Hayashi… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org

This letter considers a cooperative decision-making method for an adversarial bandit
problem on open multi-agent systems. In an open multi-agent system, the network …

Shrani Navedi Navedeno v 9 virih Sorodni članki Vse različice: 3

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Online convex optimization with unbounded memory

R Kumar, S Dean, R Kleinberg - Advances in Neural …, 2023 - proceedings.neurips.cc

Online convex optimization (OCO) is a widely used framework in online learning. In each
round, the learner chooses a decision in a convex set and an adversary chooses a convex …

Shrani Navedi Navedeno v 8 virih Sorodni članki Vse različice: 9 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Unified Regularization Approach to High-Dimensional Generalized Tensor Bandits

J Li, Y Yang, Y Wang, S Tang - arxiv preprint arxiv:2501.10722, 2025 - arxiv.org

Modern decision-making scenarios often involve data that is both high-dimensional and rich
in higher-order contextual information, where existing bandits algorithms fail to generate …

Shrani Navedi Sorodni članki V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An Adaptive Method for Non-Stationary Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System

J Gornet, M Hosseinzadeh, B Sinopoli - arxiv preprint arxiv:2406.10418, 2024 - arxiv.org

Online decision-making can be formulated as the popular stochastic multi-armed bandit
problem where a learner makes decisions (or takes actions) to maximize cumulative …

Shrani Navedi Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Learning From Interactions via Online Decision-Making and Network Science

R Kumar - 2024 - search.proquest.com

Interactions between a learner and an environment arise in a variety of domains, ranging
from online recommendations (eg, Spotify) to control of physical dynamical systems (eg …

Shrani Navedi Sorodni članki Vse različice: 2

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Stochastic contextual bandits with long horizon rewards

Multiobjective lipschitz bandits under lexicographic ordering

Cooperative learning for adversarial multi-armed bandit on open multi-agent systems

Online convex optimization with unbounded memory

A Unified Regularization Approach to High-Dimensional Generalized Tensor Bandits

An Adaptive Method for Non-Stationary Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System

Learning From Interactions via Online Decision-Making and Network Science