Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

Updet: Universal multi-agent reinforcement learning via policy decoupling with transformers

S Hu, F Zhu, X Chang, X Liang - ar** reinforcement learning algorithms that satisfy safety constraints is becoming
increasingly important in real-world applications. In multi-agent reinforcement learning …

Mate: Benchmarking multi-agent reinforcement learning in distributed target coverage control

X Pan, M Liu, F Zhong, Y Yang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract We introduce the Multi-Agent Tracking Environment (MATE), a novel multi-agent
environment simulates the target coverage control problems in the real world. MATE hosts …

Toward More Human-Like AI Communication: A Review of Emergent Communication Research

N Brandizzi - IEEE Access, 2023 - ieeexplore.ieee.org
In the recent shift towards human-centric AI, the need for machines to accurately use natural
language has become increasingly important. While a common approach to achieve this is …

Multi-agent determinantal q-learning

Y Yang, Y Wen, J Wang, L Chen… - International …, 2020 - proceedings.mlr.press
Centralized training with decentralized execution has become an important paradigm in
multi-agent learning. Though practical, current methods rely on restrictive assumptions to …

Modelling bounded rationality in multi-agent interactions by generalized recursive reasoning

Y Wen, Y Yang, R Luo, J Wang - arxiv preprint arxiv:1901.09216, 2019 - arxiv.org
Though limited in real-world decision making, most multi-agent reinforcement learning
(MARL) models assume perfectly rational agents--a property hardly met due to individual's …

Offline pre-trained multi-agent decision transformer: One big sequence model tackles all smac tasks

L Meng, M Wen, Y Yang, C Le, X Li, W Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
Offline reinforcement learning leverages previously-collected offline datasets to learn
optimal policies with no necessity to access the real environment. Such a paradigm is also …