When is partially observable reinforcement learning not scary?
Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …
agents learn to make a sequence of decisions despite lacking complete information about …
Learning in observable pomdps, without computationally intractable oracles
Much of reinforcement learning theory is built on top of oracles that are computationally hard
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …
Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms
Partial Observability--where agents can only observe partial information about the true
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …
Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes
HJA Nam, S Fleming… - Advances in Neural …, 2021 - proceedings.neurips.cc
Many real-world problems that require making optimal sequences of decisions under
uncertainty involve costs when the agent wishes to obtain information about its environment …
uncertainty involve costs when the agent wishes to obtain information about its environment …
Simple agent, complex environment: Efficient reinforcement learning with agent states
We design a simple reinforcement learning (RL) agent that implements an optimistic version
of Q-learning and establish through regret analysis that this agent can operate with some …
of Q-learning and establish through regret analysis that this agent can operate with some …
Sublinear regret for learning pomdps
We study the model‐based undiscounted reinforcement learning for partially observable
Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the …
Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the …
Bayesian learning of optimal policies in markov decision processes with countably infinite state-space
Abstract Models of many real-life applications, such as queueing models of communication
networks or computing systems, have a countably infinite state-space. Algorithmic and …
networks or computing systems, have a countably infinite state-space. Algorithmic and …
Provably efficient representation learning with tractable planning in low-rank pomdp
In this paper, we study representation learning in partially observable Markov Decision
Processes (POMDPs), where the agent learns a decoder function that maps a series of high …
Processes (POMDPs), where the agent learns a decoder function that maps a series of high …
Online learning for stochastic shortest path model via posterior sampling
We consider the problem of online reinforcement learning for the Stochastic Shortest Path
(SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL …
(SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL …