An overview of multi-agent reinforcement learning from game theoretical perspective
Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
[LIBRO][B] A concise introduction to decentralized POMDPs
FA Oliehoek, C Amato - 2016 - Springer
This book presents an overview of formal decision making methods for decentralized
cooperative systems. It is aimed at graduate students and researchers in the fields of …
cooperative systems. It is aimed at graduate students and researchers in the fields of …
[LIBRO][B] Algorithms for reinforcement learning
C Szepesvári - 2022 - books.google.com
Reinforcement learning is a learning paradigm concerned with learning to control a system
so as to maximize a numerical performance measure that expresses a long-term objective …
so as to maximize a numerical performance measure that expresses a long-term objective …
Active inference and agency: optimal control without cost functions
This paper describes a variational free-energy formulation of (partially observable) Markov
decision problems in decision making under uncertainty. We show that optimal control can …
decision problems in decision making under uncertainty. We show that optimal control can …
Goal-directed decision making as probabilistic inference: a computational framework and potential neural correlates.
A Solway, MM Botvinick - Psychological review, 2012 - psycnet.apa.org
Recent work has given rise to the view that reward-based decision making is governed by
two key controllers: a habit system, which stores stimulus–response associations shaped by …
two key controllers: a habit system, which stores stimulus–response associations shaped by …
Variational policy search via trajectory optimization
In order to learn effective control policies for dynamical systems, policy search methods must
be able to discover successful executions of the desired task. While random exploration can …
be able to discover successful executions of the desired task. While random exploration can …
Learning deep neural network policies with continuous memory states
Policy learning for partially observed control tasks requires policies that can remember
salient information from past observations. In this paper, we present a method for learning …
salient information from past observations. In this paper, we present a method for learning …
Program synthesis guided reinforcement learning for partially observed environments
A key challenge for reinforcement learning is solving long-horizon planning problems.
Recent work has leveraged programs to guide reinforcement learning in these settings …
Recent work has leveraged programs to guide reinforcement learning in these settings …
[PDF][PDF] Probabilistic inference as a model of planned behavior.
M Toussaint - Künstliche Intell., 2009 - Citeseer
The problem of planning and goal-directed behavior has been addressed in computer
science for many years, typically based on classical concepts like Bellman's optimality …
science for many years, typically based on classical concepts like Bellman's optimality …
PUMA: Planning under uncertainty with macro-actions
Planning in large, partially observable domains is challenging, especially when a long-
horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that …
horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that …