When is partially observable reinforcement learning not scary?

Q Liu, A Chung, C Szepesvári… - Conference on Learning …, 2022 - proceedings.mlr.press
Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …

Learning in observable pomdps, without computationally intractable oracles

N Golowich, A Moitra, D Rohatgi - Advances in neural …, 2022 - proceedings.neurips.cc
Much of reinforcement learning theory is built on top of oracles that are computationally hard
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms

F Chen, Y Bai, S Mei - arxiv preprint arxiv:2209.14990, 2022 - arxiv.org
Partial Observability--where agents can only observe partial information about the true
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …

Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes

HJA Nam, S Fleming… - Advances in Neural …, 2021 - proceedings.neurips.cc
Many real-world problems that require making optimal sequences of decisions under
uncertainty involve costs when the agent wishes to obtain information about its environment …

Simple agent, complex environment: Efficient reinforcement learning with agent states

S Dong, B Van Roy, Z Zhou - Journal of Machine Learning Research, 2022 - jmlr.org
We design a simple reinforcement learning (RL) agent that implements an optimistic version
of Q-learning and establish through regret analysis that this agent can operate with some …

Sublinear regret for learning pomdps

Y **ong, N Chen, X Gao… - Production and …, 2022 - journals.sagepub.com
We study the model‐based undiscounted reinforcement learning for partially observable
Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the …

Bayesian learning of optimal policies in markov decision processes with countably infinite state-space

S Adler, V Subramanian - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Abstract Models of many real-life applications, such as queueing models of communication
networks or computing systems, have a countably infinite state-space. Algorithmic and …

Provably efficient representation learning with tractable planning in low-rank pomdp

J Guo, Z Li, H Wang, M Wang… - … on Machine Learning, 2023 - proceedings.mlr.press
In this paper, we study representation learning in partially observable Markov Decision
Processes (POMDPs), where the agent learns a decoder function that maps a series of high …

Online learning for stochastic shortest path model via posterior sampling

M Jafarnia-Jahromi, L Chen, R Jain, H Luo - arxiv preprint arxiv …, 2021 - arxiv.org
We consider the problem of online reinforcement learning for the Stochastic Shortest Path
(SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL …