Offline multi-agent reinforcement learning with implicit global-to-local value regularization
Offline reinforcement learning (RL) has received considerable attention in recent years due
to its attractive capability of learning policies from offline datasets without environmental …
to its attractive capability of learning policies from offline datasets without environmental …
Safe offline reinforcement learning with feasibility-guided diffusion model
Safe offline RL is a promising way to bypass risky online interactions towards safe policy
learning. Most existing methods only enforce soft constraints, ie, constraining safety …
learning. Most existing methods only enforce soft constraints, ie, constraining safety …
Odice: Revealing the mystery of distribution correction estimation via orthogonal-gradient update
In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an
important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE …
important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE …
Are Expressive Models Truly Necessary for Offline RL?
Among various branches of offline reinforcement learning (RL) methods, goal-conditioned
supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL …
supervised learning (GCSL) has gained increasing popularity as it formulates the offline RL …
Data Center Cooling System Optimization Using Offline Reinforcement Learning
The recent advances in information technology and artificial intelligence have fueled a rapid
expansion of the data center (DC) industry worldwide, accompanied by an immense …
expansion of the data center (DC) industry worldwide, accompanied by an immense …
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
Offline reinforcement learning (RL) has garnered significant attention for its ability to learn
effective policies from pre-collected datasets without the need for further environmental …
effective policies from pre-collected datasets without the need for further environmental …
Sample-Efficient Behavior Cloning Using General Domain Knowledge
Behavior cloning has shown success in many sequential decision-making tasks by learning
from expert demonstrations, yet they can be very sample inefficient and fail to generalize to …
from expert demonstrations, yet they can be very sample inefficient and fail to generalize to …