[CARTE][B] Bayesian optimization

R Garnett - 2023 - books.google.com
Bayesian optimization is a methodology for optimizing expensive objective functions that
has proven success in the sciences, engineering, and beyond. This timely text provides a …

Poem: Out-of-distribution detection with posterior sampling

Y Ming, Y Fan, Y Li - International Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Out-of-distribution (OOD) detection is indispensable for machine learning models
deployed in the open world. Recently, the use of an auxiliary outlier dataset during training …

Randomized exploration in cooperative multi-agent reinforcement learning

HL Hsu, W Wang, M Pajic, P Xu - Advances in Neural …, 2025 - proceedings.neurips.cc
We present the first study on provably efficient randomized exploration in cooperative multi-
agent reinforcement learning (MARL). We propose a unified algorithm framework for …

Langevin monte carlo for contextual bandits

P Xu, H Zheng, EV Mazumdar… - International …, 2022 - proceedings.mlr.press
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson
sampling-based algorithms need to construct a Laplace approximation (ie, a Gaussian …

[PDF][PDF] Use your instinct: Instruction optimization using neural bandits coupled with transformers

X Lin, Z Wu, Z Dai, W Hu, Y Shu, SK Ng, P Jaillet… - arxiv preprint arxiv …, 2023 - mit.edu
Large language models (LLMs) have shown remarkable instruction-following capabilities
and achieved impressive performances in various applications. However, the performances …

Contextual bandits with large action spaces: Made practical

Y Zhu, DJ Foster, J Langford… - … Conference on Machine …, 2022 - proceedings.mlr.press
A central problem in sequential decision making is to develop algorithms that are practical
and computationally efficient, yet support the use of flexible, general-purpose models …

Approximate thompson sampling via epistemic neural networks

I Osband, Z Wen, SM Asghari… - Uncertainty in …, 2023 - proceedings.mlr.press
Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling
from a posterior distribution. Unfortunately, this can become computationally intractable in …

Neural contextual bandits with deep representation and shallow exploration

P Xu, Z Wen, H Zhao, Q Gu - arxiv preprint arxiv:2012.01780, 2020 - arxiv.org
We study a general class of contextual bandits, where each context-action pair is associated
with a raw feature vector, but the reward generating function is unknown. We propose a …

Optimal order simple regret for Gaussian process bandits

S Vakili, N Bouziani, S Jalali… - Advances in Neural …, 2021 - proceedings.neurips.cc
Consider the sequential optimization of a continuous, possibly non-convex, and expensive
to evaluate objective function $ f $. The problem can be cast as a Gaussian Process (GP) …

Quantum bayesian optimization

Z Dai, GKR Lau, A Verma, Y Shu… - Advances in Neural …, 2023 - proceedings.neurips.cc
Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent
method for optimizing complicated black-box reward functions. Various BO algorithms have …