Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

[КНИГА][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com
Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

Why is posterior sampling better than optimism for reinforcement learning?

I Osband, B Van Roy - International conference on machine …, 2017 - proceedings.mlr.press
Computational results demonstrate that posterior sampling for reinforcement learning
(PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2 …

Linear thompson sampling revisited

M Abeille, A Lazaric - Artificial Intelligence and Statistics, 2017 - proceedings.mlr.press
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …

Generalization and exploration via randomized value functions

I Osband, B Van Roy, Z Wen - International Conference on …, 2016 - proceedings.mlr.press
We propose randomized least-squares value iteration (RLSVI)–a new reinforcement
learning algorithm designed to explore and generalize efficiently via linearly parameterized …

Frequentist regret bounds for randomized least-squares value iteration

A Zanette, D Brandfonbrener… - International …, 2020 - proceedings.mlr.press
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning
(RL). When the state space is large or continuous, traditional tabular approaches are …