Google Академія

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Зберегти Послатися Цитовано в 220 джерелах Пов’язані статті Кількість версій: 14

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Зберегти Послатися Цитовано в 1268 джерелах Пов’язані статті Кількість версій: 19 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Model-based reinforcement learning: A survey

TM Moerland, J Broekens, A Plaat… - … and Trends® in …, 2023 - nowpublishers.com

Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …

Зберегти Послатися Цитовано в 954 джерелах Пов’язані статті Кількість версій: 15 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] tor-lattimore.com

[КНИГА][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Зберегти Послатися Цитовано в 3363 джерелах Пов’язані статті Кількість версій: 9

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Bayesian reinforcement learning: A survey

M Ghavamzadeh, S Mannor, J Pineau… - … and Trends® in …, 2015 - nowpublishers.com

Bayesian methods for machine learning have been widely investigated, yielding principled
methods for incorporating prior information into inference algorithms. In this survey, we …

Зберегти Послатися Цитовано в 592 джерелах Пов’язані статті Кількість версій: 11 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org

We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

Зберегти Послатися Цитовано в 362 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Why is posterior sampling better than optimism for reinforcement learning?

I Osband, B Van Roy - International conference on machine …, 2017 - proceedings.mlr.press

Computational results demonstrate that posterior sampling for reinforcement learning
(PSRL) dramatically outperforms existing algorithms driven by optimism, such as UCRL2 …

Зберегти Послатися Цитовано в 286 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Linear thompson sampling revisited

M Abeille, A Lazaric - Artificial Intelligence and Statistics, 2017 - proceedings.mlr.press

We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …

Зберегти Послатися Цитовано в 306 джерелах Пов’язані статті Кількість версій: 18 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Generalization and exploration via randomized value functions

I Osband, B Van Roy, Z Wen - International Conference on …, 2016 - proceedings.mlr.press

We propose randomized least-squares value iteration (RLSVI)–a new reinforcement
learning algorithm designed to explore and generalize efficiently via linearly parameterized …

Зберегти Послатися Цитовано в 356 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Frequentist regret bounds for randomized least-squares value iteration

A Zanette, D Brandfonbrener… - International …, 2020 - proceedings.mlr.press

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning
(RL). When the state space is large or continuous, traditional tabular approaches are …

Зберегти Послатися Цитовано в 158 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Thompson sampling for learning parameterized markov decision processes

Recent advances in reinforcement learning in finance

A tutorial on thompson sampling

Model-based reinforcement learning: A survey

[КНИГА][B] Bandit algorithms

Bayesian reinforcement learning: A survey

Deep exploration via randomized value functions

Why is posterior sampling better than optimism for reinforcement learning?

Linear thompson sampling revisited

Generalization and exploration via randomized value functions

Frequentist regret bounds for randomized least-squares value iteration