- Academic Search

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

Gem Citer Citeret af 33 Relaterede artikler Alle 9 versioner Vis som HTML

Comprehensive transformer-based model architecture for real-world storm prediction

F Lin, X Yuan, Y Zhang, P Sigdel, L Chen… - … Conference on Machine …, 2023 - Springer

Storm prediction provides the early alert for preparation, avoiding potential damage to
property and human safety. However, a traditional storm prediction model usually incurs …

Gem Citer Citeret af 20 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arxiv preprint arxiv:2205.13589, 2022 - arxiv.org

We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

Gem Citer Citeret af 33 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

Gem Citer Citeret af 9 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study multi-agent reinforcement learning in the setting of episodic Markov decision
processes, where many agents cooperate via communication through a central server. We …

Gem Citer Citeret af 11 Relaterede artikler Alle 7 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

Gem Citer Citeret af 40 Relaterede artikler Alle 11 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

C Zhao, R Yang, B Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this work, we study the low-rank MDPs with adversarially changed losses in the full-
information feedback setting. In particular, the unknown transition probability kernel admits a …

Gem Citer Citeret af 5 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning

Y Fei, R Xu - International Conference on Machine Learning, 2022 - proceedings.mlr.press

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …

Gem Citer Citeret af 14 Relaterede artikler Alle 2 versioner Vis som HTML

Augment online linear optimization with arbitrarily bad machine-learned predictions

D Wen, Y Li, FCM Lau - IEEE INFOCOM 2024-IEEE Conference …, 2024 - ieeexplore.ieee.org

The online linear optimization paradigm is important to many real-world network
applications as well as theoretical algorithmic studies. Recent studies have made attempts …

Gem Citer Citeret af 3 Relaterede artikler

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition

C Zhao, R Yang, B Wang, S Li - The Eleventh International …, 2023 - openreview.net

We study reinforcement learning (RL) with linear function approximation, unknown
transition, and adversarial losses in the bandit feedback setting. Specifically, the unknown …

Gem Citer Citeret af 12 Relaterede artikler Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Learning stochastic shortest path with linear function approximation

Learn to match with no regret: Reinforcement learning in markov matching markets

Comprehensive transformer-based model architecture for real-world storm prediction

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

Noise-adaptive thompson sampling for linear contextual bandits

Cooperative multi-agent reinforcement learning: asynchronous communication and linear function approximation

Variance-aware off-policy evaluation with linear function approximation

Learning adversarial low-rank markov decision processes with unknown transition and full-information feedback

Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning

Augment online linear optimization with arbitrarily bad machine-learned predictions

Learning adversarial linear mixture markov decision processes with bandit feedback and unknown transition