Google Академія

A Tsiamis, I Ziemann, N Matni… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org

Learning algorithms have become an integral component to modern engineering solutions.
Examples range from self-driving cars and recommender systems to finance and even …

Зберегти Послатися Цитовано в 84 джерелах Пов’язані статті Кількість версій: 8

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

On the sample complexity of the linear quadratic regulator

S Dean, H Mania, N Matni, B Recht, S Tu - Foundations of Computational …, 2020 - Springer

This paper addresses the optimal control problem known as the linear quadratic regulator in
the case when the dynamics are unknown. We propose a multistage procedure, called …

Зберегти Послатися Цитовано в 662 джерелах Пов’язані статті Кількість версій: 11

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

Naive exploration is optimal for online lqr

M Simchowitz, D Foster - International Conference on …, 2020 - proceedings.mlr.press

We consider the problem of online adaptive control of the linear quadratic regulator, where
the true system parameters are unknown. We prove new upper and lower bounds …

Зберегти Послатися Цитовано в 232 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Certainty equivalence is efficient for linear quadratic control

H Mania, S Tu, B Recht - Advances in Neural Information …, 2019 - proceedings.neurips.cc

We study the performance of the certainty equivalent controller on Linear Quadratic (LQ)
control problems with unknown transition dynamics. We show that for both the fully and …

Зберегти Послатися Цитовано в 254 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] jmlr.org

Reinforcement learning in continuous time and space: A stochastic control approach

H Wang, T Zariphopoulou, XY Zhou - Journal of Machine Learning …, 2020 - jmlr.org

We consider reinforcement learning (RL) in continuous time with continuous feature and
action spaces. We motivate and devise an exploratory formulation for the feature dynamics …

Зберегти Послатися Цитовано в 189 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] jmlr.org

Derivative-free methods for policy optimization: Guarantees for linear quadratic systems

D Malik, A Pananjady, K Bhatia, K Khamaru… - Journal of Machine …, 2020 - jmlr.org

We study derivative-free methods for policy optimization over the class of linear policies. We
focus on characterizing the convergence rate of these methods when applied to linear …

Зберегти Послатися Цитовано в 234 джерелах Пов’язані статті Кількість версій: 10 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT $ Regret

A Cohen, T Koren, Y Mansour - International Conference on …, 2019 - proceedings.mlr.press

We present the first computationally-efficient algorithm with $\widetilde {O}(\sqrt {T}) $ regret
for learning in Linear Quadratic Control systems with unknown dynamics. By that, we resolve …

Зберегти Послатися Цитовано в 213 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint

S Tu, B Recht - Conference on learning theory, 2019 - proceedings.mlr.press

The effectiveness of model-based versus model-free methods is a long-standing question in
reinforcement learning (RL). Motivated by recent empirical success of RL on continuous …

Зберегти Послатися Цитовано в 181 джерелах Пов’язані статті Кількість версій: 4 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Logarithmic regret bound in partially observable linear dynamical systems

S Lale, K Azizzadenesheli, B Hassibi… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study the problem of system identification and adaptive control in partially observable
linear dynamical systems. Adaptive and closed-loop system identification is a challenging …

Зберегти Послатися Цитовано в 117 джерелах Пов’язані статті Кількість версій: 13 Показати у форматі HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Feel-good thompson sampling for contextual bandits and reinforcement learning

T Zhang - SIAM Journal on Mathematics of Data Science, 2022 - SIAM

Thompson sampling has been widely used for contextual bandit problems due to the
flexibility of its modeling power. However, a general theory for this class of methods in the …

Зберегти Послатися Цитовано в 72 джерелах Пов’язані статті Кількість версій: 4

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Improved regret bounds for thompson sampling in linear quadratic control problems

Statistical learning theory for control: A finite-sample perspective

On the sample complexity of the linear quadratic regulator

Naive exploration is optimal for online lqr

Certainty equivalence is efficient for linear quadratic control

Reinforcement learning in continuous time and space: A stochastic control approach

Derivative-free methods for policy optimization: Guarantees for linear quadratic systems

Learning Linear-Quadratic Regulators Efficiently with only $\sqrtT $ Regret

The gap between model-based and model-free methods on the linear quadratic regulator: An asymptotic viewpoint

Logarithmic regret bound in partially observable linear dynamical systems

Feel-good thompson sampling for contextual bandits and reinforcement learning