- Academic Search

MAU Zaman, A Koppel, S Bhatt… - … Conference on Artificial …, 2023 - proceedings.mlr.press

We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional
approaches, we alleviate the need for a mean-field oracle by develo** an algorithm that …

Save Cite Cited by 22 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

W Mou, A Pananjady, MJ Wainwright… - arxiv preprint arxiv …, 2021 - arxiv.org

We study stochastic approximation procedures for approximately solving a $ d $-
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …

Save Cite Cited by 24 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Estimating the mixing time of ergodic markov chains

G Wolfer, A Kontorovich - Conference on Learning Theory, 2019 - proceedings.mlr.press

We address the problem of estimating the mixing time $ t_ {\mathsf {mix}} $ of an arbitrary
ergodic finite Markov chain from a single trajectory of length $ m $. The reversible case was …

Save Cite Cited by 50 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Simple and optimal methods for stochastic variational inequalities, ii: Markovian noise and policy evaluation in reinforcement learning

G Kotsalis, G Lan, T Li - SIAM Journal on Optimization, 2022 - SIAM

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A
prominent application of our algorithmic developments is the stochastic policy evaluation …

Save Cite Cited by 35 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org

We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

Save Cite Cited by 14 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net

Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] frontiersin.org

Information geometry of Markov kernels: a survey

G Wolfer, S Watanabe - Frontiers in Physics, 2023 - frontiersin.org

Information geometry and Markov chains are two powerful tools used in modern fields such
as finance, physics, computer science, and epidemiology. In this survey, we explore their …

Save Cite Cited by 3 Related articles All 4 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] springer.com

Information geometry of reversible Markov chains

G Wolfer, S Watanabe - Information Geometry, 2021 - Springer

We analyze the information geometric structure of time reversibility for parametric families of
irreducible transition kernels of Markov chains. We define and characterize reversible …

Save Cite Cited by 17 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Br-snis: bias reduced self-normalized importance sampling

G Cardoso, S Samsonov, A Thin… - Advances in Neural …, 2022 - proceedings.neurips.cc

Importance Sampling (IS) is a method for approximating expectations with respect to a target
distribution using independent samples from a proposal distribution and the associated to …

Save Cite Cited by 6 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence

A Pananjady, V Muthukumar, A Thangaraj - Journal of Machine Learning …, 2024 - jmlr.org

We study the problem of estimating the stationary mass---also called the unigram mass---
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …

Save Cite Related articles View as HTML

Create alert

Cite

Advanced search

Saved to My library

Mixing time estimation in reversible Markov chains from a single sample path

Oracle-free reinforcement learning in mean-field games along a single sample path

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

Estimating the mixing time of ergodic markov chains

Simple and optimal methods for stochastic variational inequalities, ii: Markovian noise and policy evaluation in reinforcement learning

Stochastic first-order methods for average-reward markov decision processes

Information-directed pessimism for offline reinforcement learning

Information geometry of Markov kernels: a survey

Information geometry of reversible Markov chains

Br-snis: bias reduced self-normalized importance sampling

Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence