Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

Comprehensive transformer-based model architecture for real-world storm prediction

F Lin, X Yuan, Y Zhang, P Sigdel, L Chen… - … Conference on Machine …, 2023 - Springer
Storm prediction provides the early alert for preparation, avoiding potential damage to
property and human safety. However, a traditional storm prediction model usually incurs …

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

C Zhao, R Yang, B Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …

Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning

Y Fei, R Xu - International Conference on Machine Learning, 2022 - proceedings.mlr.press
In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …

Augment online linear optimization with arbitrarily bad machine-learned predictions

D Wen, Y Li, FCM Lau - IEEE INFOCOM 2024-IEEE Conference …, 2024 - ieeexplore.ieee.org
The online linear optimization paradigm is important to many real-world network
applications as well as theoretical algorithmic studies. Recent studies have made attempts …

Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition

C Zhao, R Yang, B Wang, S Li - The Eleventh International …, 2023 - openreview.net
We study reinforcement learning (RL) with linear function approximation, unknown
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …