Oracle-free reinforcement learning in mean-field games along a single sample path
We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike traditional
approaches, we alleviate the need for a mean-field oracle by develo** an algorithm that …
approaches, we alleviate the need for a mean-field oracle by develo** an algorithm that …
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation
We study stochastic approximation procedures for approximately solving a $ d $-
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …
Estimating the mixing time of ergodic markov chains
We address the problem of estimating the mixing time $ t_ {\mathsf {mix}} $ of an arbitrary
ergodic finite Markov chain from a single trajectory of length $ m $. The reversible case was …
ergodic finite Markov chain from a single trajectory of length $ m $. The reversible case was …
Simple and optimal methods for stochastic variational inequalities, ii: Markovian noise and policy evaluation in reinforcement learning
The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A
prominent application of our algorithmic developments is the stochastic policy evaluation …
prominent application of our algorithmic developments is the stochastic policy evaluation …
Stochastic first-order methods for average-reward markov decision processes
We study average-reward Markov decision processes (AMDPs) and develop novel first-
order methods with strong theoretical guarantees for both policy optimization and policy …
order methods with strong theoretical guarantees for both policy optimization and policy …
Information-directed pessimism for offline reinforcement learning
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
collecting data from a current policy is not possible. This setting incurs distribution mismatch …
Information geometry of Markov kernels: a survey
Information geometry and Markov chains are two powerful tools used in modern fields such
as finance, physics, computer science, and epidemiology. In this survey, we explore their …
as finance, physics, computer science, and epidemiology. In this survey, we explore their …
Information geometry of reversible Markov chains
We analyze the information geometric structure of time reversibility for parametric families of
irreducible transition kernels of Markov chains. We define and characterize reversible …
irreducible transition kernels of Markov chains. We define and characterize reversible …
Br-snis: bias reduced self-normalized importance sampling
Importance Sampling (IS) is a method for approximating expectations with respect to a target
distribution using independent samples from a proposal distribution and the associated to …
distribution using independent samples from a proposal distribution and the associated to …
Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence
We study the problem of estimating the stationary mass---also called the unigram mass---
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …
that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem …