Methods for improving the availability of spot instances: A survey
L Lin, L Pan, S Liu - Computers in Industry, 2022 - Elsevier
The burgeoning development of the cloud market has promoted the expansion of resources
held by cloud providers, but the resulting underutilization caused by the over-provisioned …
held by cloud providers, but the resulting underutilization caused by the over-provisioned …
Fast active learning for pure exploration in reinforcement learning
Realistic environments often provide agents with very limited feedback. When the
environment is initially unknown, the feedback, in the beginning, can be completely absent …
environment is initially unknown, the feedback, in the beginning, can be completely absent …
Instance-dependent near-optimal policy identification in linear mdps via online experiment design
While much progress has been made in understanding the minimax sample complexity of
reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …
reinforcement learning (RL)---the complexity of learning on the worst-case''instance---such …
Adaptive reward-free exploration
Reward-free exploration is a reinforcement learning setting recently studied by (** et al.
2020), who address it by running several algorithms with regret guarantees in parallel. In our …
2020), who address it by running several algorithms with regret guarantees in parallel. In our …
Policy finetuning in reinforcement learning via design of experiments using offline data
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …
already availablebut it is also possible to acquire some additional online data to help …
Mixture martingales revisited with applications to sequential tests and confidence intervals
This paper presents new deviation inequalities that are valid uniformly in time under
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
adaptive sampling in a multi-armed bandit model. The deviations are measured using the …
Towards theoretical understanding of inverse reinforcement learning
Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a
reward function justifying the behavior demonstrated by an expert agent. A well-known …
reward function justifying the behavior demonstrated by an expert agent. A well-known …
Beyond no regret: Instance-dependent pac reinforcement learning
The theory of reinforcement learning has focused on two fundamental problems: achieving
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …
low regret, and identifying $\epsilon $-optimal policies. While a simple reduction allows one …
Fast rates for maximum entropy exploration
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …
operates in an unknown environment with sparse or no rewards. In this work, we study the …
Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees
We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …