Diversify and disambiguate: Learning from underspecified data
K-level reasoning for zero-shot coordination in hanabi
The standard problem setting in cooperative multi-agent settings is\emph {self-play}(SP),
where the goal is to train a\emph {team} of agents that works well together. However, optimal …
where the goal is to train a\emph {team} of agents that works well together. However, optimal …
Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory
Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL);
however, its effects are far from understood. To make progress in this direction, we study a …
however, its effects are far from understood. To make progress in this direction, we study a …
Iteratively learn diverse strategies with state distance information
In complex reinforcement learning (RL) problems, policies with similar rewards may have
substantially different behaviors. It remains a fundamental challenge to optimize rewards …
substantially different behaviors. It remains a fundamental challenge to optimize rewards …
Continuously discovering novel strategies via reward-switching policy optimization
We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse
strategies in complex RL environments by iteratively finding novel policies that are both …
strategies in complex RL environments by iteratively finding novel policies that are both …