Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy
We study a bilevel economic system, which we refer to as a Markov exchange economy
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …
Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
Noise-adaptive thompson sampling for linear contextual bandits
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …
world applications, and it is critical to develop algorithms that can effectively manage noise …
Fairness in matching under uncertainty
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention
to the issue of fairness in such settings. Algorithmic decisions are used in assigning students …
to the issue of fairness in such settings. Algorithmic decisions are used in assigning students …
Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning
In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …
Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …
processes, where many agents cooperate via communication through a central server. We …
Statistical inference and A/B testing for first-price pacing equilibria
We initiate the study of statistical inference and A/B testing for first-price pacing equilibria
(FPPE). The FPPE model captures the dynamics resulting from large-scale first-price auction …
(FPPE). The FPPE model captures the dynamics resulting from large-scale first-price auction …
Rate-optimal contextual online matching bandit
Two-sided online matching platforms have been employed in various markets. However,
agents' preferences in present market are usually implicit and unknown and must be learned …
agents' preferences in present market are usually implicit and unknown and must be learned …
Player-optimal stable regret for bandit learning in matching markets
The problem of matching markets has been studied for a long time in the literature due to its
wide range of applications. Finding a stable matching is a common equilibrium objective in …
wide range of applications. Finding a stable matching is a common equilibrium objective in …
Finding regularized competitive equilibria of heterogeneous agent macroeconomic models via reinforcement learning
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income and …
households and firms competing in a labor market. Each household earns income and …