Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

Causal semantic communication for digital twins: A generalizable imitation learning approach

CK Thomas, W Saad, Y **ao - IEEE Journal on Selected Areas …, 2023 - ieeexplore.ieee.org
A digital twin (DT) leverages a virtual representation of the physical world, along with
communication (eg, 6G), computing (eg, edge computing), and artificial intelligence (AI) …

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

Contextual information-directed sampling

B Hao, T Lattimore, C Qin - International Conference on …, 2022 - proceedings.mlr.press
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc
The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

Satisficing exploration for deep reinforcement learning

D Arumugam, S Kumar, R Gummadi… - arxiv preprint arxiv …, 2024 - arxiv.org
A default assumption in the design of reinforcement-learning algorithms is that a decision-
making agent always explores to learn optimal behavior. In sufficiently complex …

Provably efficient information-directed sampling algorithms for multi-agent reinforcement learning

Q Zhang, C Bai, S Hu, Z Wang, X Li - arxiv preprint arxiv:2404.19292, 2024 - arxiv.org
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement
learning (MARL) based on the principle of information-directed sampling (IDS). These …

On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning

D Arumugam, MK Ho, ND Goodman… - arxiv preprint arxiv …, 2022 - arxiv.org
Throughout the cognitive-science literature, there is widespread agreement that decision-
making agents operating in the real world do so under limited information-processing …

Exploration Unbound

D Arumugam, W Xu, B Van Roy - arxiv preprint arxiv:2407.12178, 2024 - arxiv.org
A sequential decision-making agent balances between exploring to gain new knowledge
about an environment and exploiting current knowledge to maximize immediate reward. For …

Parallel Bayesian Optimization Using Satisficing Thompson Sampling for Time-Sensitive Black-Box Optimization

X Song, B Jiang - arxiv preprint arxiv:2310.12526, 2023 - arxiv.org
Bayesian optimization (BO) is widely used for black-box optimization problems, and have
been shown to perform well in various real-world tasks. However, most of the existing BO …