Trustworthy distributed ai systems: Robustness, privacy, and governance
Emerging Distributed AI systems are revolutionizing big data computing and data
processing capabilities with growing economic and societal impact. However, recent studies …
processing capabilities with growing economic and societal impact. However, recent studies …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Corruption-robust offline reinforcement learning with general function approximation
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …
with general function approximation, where an adversary can corrupt each sample in the …
Bypassing the simulator: Near-optimal adversarial linear contextual bandits
We consider the adversarial linear contextual bandit problem, where the loss vectors are
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
selected fully adversarially and the per-round action set (ie the context) is drawn from a fixed …
Feel-good thompson sampling for contextual bandits and reinforcement learning
T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM
Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …
flexibility of its modeling power. However, a general theory for this class of methods in the …
Nearly optimal algorithms for linear contextual bandits with adversarial corruptions
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes
Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …
adversarial corruption, current works are either confined to the linear setting or lead to an …
Contextual bandits with large action spaces: Made practical
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …
and computationally efficient, yet support the use of flexible, general-purpose models …
Bayesian decision-making under misspecified priors with applications to meta-learning
M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc
Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …
A model selection approach for corruption robust reinforcement learning
We develop a model selection approach to tackle reinforcement learning with adversarial
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …
corruption in both transition and reward. For finite-horizon tabular MDPs, without prior …