Oracle-free reinforcement learning in mean-field games along a single sample path

MAU Zaman, A Koppel, S Bhatt… - … Conference on Artificial …, 2023 - proceedings.mlr.press
We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional
approaches, we alleviate the need for a mean-field oracle by develo** an algorithm that …

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

W Mou, A Pananjady, MJ Wainwright… - arxiv preprint arxiv …, 2021 - arxiv.org
We study stochastic approximation procedures for approximately solving a $ d $-
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …

Estimating the mixing time of ergodic markov chains

G Wolfer, A Kontorovich - Conference on Learning Theory, 2019 - proceedings.mlr.press
We address the problem of estimating the mixing time $ t_ {\mathsf {mix}} $ of an arbitrary
ergodic finite Markov chain from a single trajectory of length $ m $. The reversible case was …

Simple and optimal methods for stochastic variational inequalities, ii: Markovian noise and policy evaluation in reinforcement learning

G Kotsalis, G Lan, T Li - SIAM Journal on Optimization, 2022 - SIAM
The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A
prominent application of our algorithmic developments is the stochastic policy evaluation …

Stochastic first-order methods for average-reward markov decision processes

T Li, F Wu, G Lan - Mathematics of Operations Research, 2024 - pubsonline.informs.org
We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …

Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Information geometry of Markov kernels: a survey

G Wolfer, S Watanabe - Frontiers in Physics, 2023 - frontiersin.org
Information geometry and Markov chains are two powerful tools used in modern fields such
as finance, physics, computer science, and epidemiology. In this survey, we explore their …

Information geometry of reversible Markov chains

G Wolfer, S Watanabe - Information Geometry, 2021 - Springer
We analyze the information geometric structure of time reversibility for parametric families of
irreducible transition kernels of Markov chains. We define and characterize reversible …

Br-snis: bias reduced self-normalized importance sampling

G Cardoso, S Samsonov, A Thin… - Advances in Neural …, 2022 - proceedings.neurips.cc
Importance Sampling (IS) is a method for approximating expectations with respect to a target
distribution using independent samples from a proposal distribution and the associated to …

Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence

A Pananjady, V Muthukumar, A Thangaraj - Journal of Machine Learning …, 2024 - jmlr.org
We study the problem of estimating the stationary mass---also called the unigram mass---
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …