- Academic Search

Q Liu, A Chung, C Szepesvári… - Conference on Learning …, 2022 - proceedings.mlr.press

Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …

Save Cite Cited by 112 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Learning in observable pomdps, without computationally intractable oracles

N Golowich, A Moitra, D Rohatgi - Advances in neural …, 2022 - proceedings.neurips.cc

Much of reinforcement learning theory is built on top of oracles that are computationally hard
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …

Save Cite Cited by 39 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org

We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Save Cite Cited by 30 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms

F Chen, Y Bai, S Mei - arxiv preprint arxiv:2209.14990, 2022 - arxiv.org

Partial Observability--where agents can only observe partial information about the true
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …

Save Cite Cited by 24 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes

HJA Nam, S Fleming… - Advances in Neural …, 2021 - proceedings.neurips.cc

Many real-world problems that require making optimal sequences of decisions under
uncertainty involve costs when the agent wishes to obtain information about its environment …

Save Cite Cited by 28 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Simple agent, complex environment: Efficient reinforcement learning with agent states

S Dong, B Van Roy, Z Zhou - Journal of Machine Learning Research, 2022 - jmlr.org

We design a simple reinforcement learning (RL) agent that implements an optimistic version
of Q-learning and establish through regret analysis that this agent can operate with some …

Save Cite Cited by 49 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Sublinear regret for learning pomdps

Y **ong, N Chen, X Gao… - Production and …, 2022 - journals.sagepub.com

We study the model‐based undiscounted reinforcement learning for partially observable
Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the …

Save Cite Cited by 25 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Bayesian learning of optimal policies in markov decision processes with countably infinite state-space

S Adler, V Subramanian - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Abstract Models of many real-life applications, such as queueing models of communication
networks or computing systems, have a countably infinite state-space. Algorithmic and …

Save Cite Cited by 5 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Provably efficient representation learning with tractable planning in low-rank pomdp

J Guo, Z Li, H Wang, M Wang… - … on Machine Learning, 2023 - proceedings.mlr.press

In this paper, we study representation learning in partially observable Markov Decision
Processes (POMDPs), where the agent learns a decoder function that maps a series of high …

Save Cite Cited by 4 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Online learning for stochastic shortest path model via posterior sampling

M Jafarnia-Jahromi, L Chen, R Jain, H Luo - arxiv preprint arxiv …, 2021 - arxiv.org

We consider the problem of online reinforcement learning for the Stochastic Shortest Path
(SSP) problem modeled as an unknown MDP with an absorbing state. We propose PSRL …

Save Cite Cited by 20 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Online learning for unknown partially observable mdps

When is partially observable reinforcement learning not scary?

Learning in observable pomdps, without computationally intractable oracles

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms

Reinforcement learning with state observation costs in action-contingent noiselessly observable markov decision processes

Simple agent, complex environment: Efficient reinforcement learning with agent states

Sublinear regret for learning pomdps

Bayesian learning of optimal policies in markov decision processes with countably infinite state-space

Provably efficient representation learning with tractable planning in low-rank pomdp

Online learning for stochastic shortest path model via posterior sampling