- Academic Search

L Lin, L Pan, S Liu - Computers in Industry, 2022 - Elsevier

The burgeoning development of the cloud market has promoted the expansion of resources
held by cloud providers, but the resulting underutilization caused by the over-provisioned …

Save Cite Cited by 17 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Fast active learning for pure exploration in reinforcement learning

P Ménard, OD Domingues, A Jonsson… - International …, 2021 - proceedings.mlr.press

Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …

Save Cite Cited by 91 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Instance-dependent near-optimal policy identification in linear mdps via online experiment design

A Wagenmaker, KG Jamieson - Advances in Neural …, 2022 - proceedings.neurips.cc

While much progress has been made in understanding the minimax sample complexity of
reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …

Save Cite Cited by 31 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Adaptive reward-free exploration

E Kaufmann, P Ménard… - Algorithmic …, 2021 - proceedings.mlr.press

Reward-free exploration is a reinforcement learning setting recently studied by (** et al.
2020), who address it by running several algorithms with regret guarantees in parallel. In our …

Save Cite Cited by 96 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

Save Cite Cited by 7 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Mixture martingales revisited with applications to sequential tests and confidence intervals

E Kaufmann, WM Koolen - Journal of Machine Learning Research, 2021 - jmlr.org

This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …

Save Cite Cited by 141 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Towards theoretical understanding of inverse reinforcement learning

AM Metelli, F Lazzati, M Restelli - … Conference on Machine …, 2023 - proceedings.mlr.press

Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a
reward function justifying the behavior demonstrated by an expert agent. A well-known …

Save Cite Cited by 23 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Beyond no regret: Instance-dependent pac reinforcement learning

AJ Wagenmaker, M Simchowitz… - … on Learning Theory, 2022 - proceedings.mlr.press

The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …

Save Cite Cited by 40 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Save Cite Cited by 15 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

Save Cite Cited by 9 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Planning in markov decision processes with gap-dependent sample complexity

Methods for improving the availability of spot instances: A survey

Fast active learning for pure exploration in reinforcement learning

Instance-dependent near-optimal policy identification in linear mdps via online experiment design

Adaptive reward-free exploration

Policy finetuning in reinforcement learning via design of experiments using offline data

Mixture martingales revisited with applications to sequential tests and confidence intervals

Towards theoretical understanding of inverse reinforcement learning

Beyond no regret: Instance-dependent pac reinforcement learning

Fast rates for maximum entropy exploration

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees