Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Advances of machine learning in materials science: Ideas and techniques

SS Chong, YS Ng, HQ Wang, JC Zheng - Frontiers of Physics, 2024 - Springer
In this big data era, the use of large dataset in conjunction with machine learning (ML) has
been increasingly popular in both industry and academia. In recent times, the field of …

Efficient and targeted COVID-19 border testing via reinforcement learning

H Bastani, K Drakopoulos, V Gupta, I Vlachogiannis… - Nature, 2021 - nature.com
Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …

Federated linear contextual bandits

R Huang, W Wu, J Yang… - Advances in neural …, 2021 - proceedings.neurips.cc
This paper presents a novel federated linear contextual bandits model, where individual
clients face different $ K $-armed stochastic bandits coupled through common global …

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

The sample complexity of online contract design

B Zhu, S Bates, Z Yang, Y Wang, J Jiao… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the hidden-action principal-agent problem in an online setting. In each round, the
principal posts a contract that specifies the payment to the agent based on each outcome …

Multi-armed bandit experimental design: Online decision-making and adaptive inference

D Simchi-Levi, C Wang - International Conference on …, 2023 - proceedings.mlr.press
Multi-armed bandit has been well-known for its efficiency in online decision-making in terms
of minimizing the loss of the participants' welfare during experiments (ie, the regret). In …

Provably efficient q-learning with low switching cost

Y Bai, T **e, N Jiang, YX Wang - Advances in Neural …, 2019 - proceedings.neurips.cc
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …

Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …