Diversify and disambiguate: Learning from underspecified data

Y Lee, H Yao, C Finn - ar** deep learning models, we usually decide what task we want to solve then
search for a model that generalizes well on the task. An intriguing question would be: what if …

K-level reasoning for zero-shot coordination in hanabi

B Cui, H Hu, L Pineda… - Advances in Neural …, 2021 - proceedings.neurips.cc
The standard problem setting in cooperative multi-agent settings is\emph {self-play}(SP),
where the goal is to train a\emph {team} of agents that works well together. However, optimal …

Exploration-exploitation in multi-agent learning: Catastrophe theory meets game theory

S Leonardos, G Piliouras - Artificial Intelligence, 2022 - Elsevier
Exploration-exploitation is a powerful and practical tool in multi-agent learning (MAL);
however, its effects are far from understood. To make progress in this direction, we study a …

Iteratively learn diverse strategies with state distance information

W Fu, W Du, J Li, S Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
In complex reinforcement learning (RL) problems, policies with similar rewards may have
substantially different behaviors. It remains a fundamental challenge to optimize rewards …

Continuously discovering novel strategies via reward-switching policy optimization

Z Zhou, W Fu, B Zhang, Y Wu - arxiv preprint arxiv:2204.02246, 2022 - arxiv.org
We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse
strategies in complex RL environments by iteratively finding novel policies that are both …