Offline multi-agent reinforcement learning with implicit global-to-local value regularization

X Wang, H Xu, Y Zheng, X Zhan - Advances in Neural …, 2024 - proceedings.neurips.cc
Offline reinforcement learning (RL) has received considerable attention in recent years due
to its attractive capability of learning policies from offline datasets without environmental …

Safe offline reinforcement learning with feasibility-guided diffusion model

Y Zheng, J Li, D Yu, Y Yang, SE Li, X Zhan… - arxiv preprint arxiv …, 2024 - arxiv.org
Safe offline RL is a promising way to bypass risky online interactions towards safe policy
learning. Most existing methods only enforce soft constraints, ie, constraining safety …

Odice: Revealing the mystery of distribution correction estimation via orthogonal-gradient update

L Mao, H Xu, W Zhang, X Zhan - arxiv preprint arxiv:2402.00348, 2024 - arxiv.org
In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an
important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE …

Are Expressive Models Truly Necessary for Offline RL?

G Wang, H Niu, J Li, L Jiang, J Hu, X Zhan - arxiv preprint arxiv …, 2024 - arxiv.org
Among various branches of offline reinforcement learning (RL) methods, goal-conditioned
supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL …

Data Center Cooling System Optimization Using Offline Reinforcement Learning

X Zhan, X Zhu, P Cheng, X Hu, Z He, H Geng… - arxiv preprint arxiv …, 2025 - arxiv.org
The recent advances in information technology and artificial intelligence have fueled a rapid
expansion of the data center (DC) industry worldwide, accompanied by an immense …

ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization

TV Bui, TH Nguyen, T Mai - arxiv preprint arxiv:2410.01954, 2024 - arxiv.org
Offline reinforcement learning (RL) has garnered significant attention for its ability to learn
effective policies from pre-collected datasets without the need for further environmental …

Sample-Efficient Behavior Cloning Using General Domain Knowledge

F Zhu, J Oh, R Simmons - arxiv preprint arxiv:2501.16546, 2025 - arxiv.org
Behavior cloning has shown success in many sequential decision-making tasks by learning
from expert demonstrations, yet they can be very sample inefficient and fail to generalize to …