Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy

Z Liu, M Lu, Z Wang, M Jordan… - … Conference on Machine …, 2022 - proceedings.mlr.press
We study a bilevel economic system, which we refer to as a Markov exchange economy
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Fairness in matching under uncertainty

S Devic, D Kempe, V Sharan… - … on Machine Learning, 2023 - proceedings.mlr.press
The prevalence and importance of algorithmic two-sided marketplaces has drawn attention
to the issue of fairness in such settings. Algorithmic decisions are used in assigning students …

Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning

Y Fei, R Xu - International Conference on Machine Learning, 2022 - proceedings.mlr.press
In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …

Cooperative multi-agent reinforcement learning: Asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Statistical inference and A/B testing for first-price pacing equilibria

L Liao, C Kroer - International Conference on Machine …, 2023 - proceedings.mlr.press
We initiate the study of statistical inference and A/B testing for first-price pacing equilibria
(FPPE). The FPPE model captures the dynamics resulting from large-scale first-price auction …

Rate-optimal contextual online matching bandit

Y Li, C Wang, G Cheng, WW Sun - arxiv preprint arxiv:2205.03699, 2022 - arxiv.org
Two-sided online matching platforms have been employed in various markets. However,
agents' preferences in present market are usually implicit and unknown and must be learned …

Player-optimal stable regret for bandit learning in matching markets

F Kong, S Li - Proceedings of the 2023 Annual ACM-SIAM …, 2023 - SIAM
The problem of matching markets has been studied for a long time in the literature due to its
wide range of applications. Finding a stable matching is a common equilibrium objective in …

Finding regularized competitive equilibria of heterogeneous agent macroeconomic models via reinforcement learning

R Xu, Y Min, T Wang, MI Jordan… - International …, 2023 - proceedings.mlr.press
We study a heterogeneous agent macroeconomic model with an infinite number of
households and firms competing in a labor market. Each household earns income and …