Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning
Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …
Updet: Universal multi-agent reinforcement learning via policy decoupling with transformers
Mate: Benchmarking multi-agent reinforcement learning in distributed target coverage control
Abstract We introduce the Multi-Agent Tracking Environment (MATE), a novel multi-agent
environment simulates the target coverage control problems in the real world. MATE hosts …
environment simulates the target coverage control problems in the real world. MATE hosts …
Toward More Human-Like AI Communication: A Review of Emergent Communication Research
N Brandizzi - IEEE Access, 2023 - ieeexplore.ieee.org
In the recent shift towards human-centric AI, the need for machines to accurately use natural
language has become increasingly important. While a common approach to achieve this is …
language has become increasingly important. While a common approach to achieve this is …
Multi-agent determinantal q-learning
Centralized training with decentralized execution has become an important paradigm in
multi-agent learning. Though practical, current methods rely on restrictive assumptions to …
multi-agent learning. Though practical, current methods rely on restrictive assumptions to …
Modelling bounded rationality in multi-agent interactions by generalized recursive reasoning
Though limited in real-world decision making, most multi-agent reinforcement learning
(MARL) models assume perfectly rational agents--a property hardly met due to individual's …
(MARL) models assume perfectly rational agents--a property hardly met due to individual's …
Offline pre-trained multi-agent decision transformer: One big sequence model tackles all smac tasks
Offline reinforcement learning leverages previously-collected offline datasets to learn
optimal policies with no necessity to access the real environment. Such a paradigm is also …
optimal policies with no necessity to access the real environment. Such a paradigm is also …