Information-directed pessimism for offline reinforcement learning

A Koppel, S Bhatt, J Guo, J Eappen… - … on Machine Learning, 2024 - openreview.net
Policy optimization from batch data, ie, offline reinforcement learning (RL) is important when
collecting data from a current policy is not possible. This setting incurs distribution mismatch …

Asymmetric actor-critic with approximate information state

A Sinha, A Mahajan - 2023 62nd IEEE Conference on Decision …, 2023 - ieeexplore.ieee.org
Reinforcement learning (RL) for partially observable Markov decision processes (POMDPs)
is a challenging problem because decisions need to be made based on the entire history of …

Decentralized Learning of Finite-Memory Policies in Dec-POMDPs

W Mao, K Zhang, Z Yang, T Başar - IFAC-PapersOnLine, 2023 - Elsevier
Multi-agent reinforcement learning (MARL) under partial observability is notoriously
challenging as the agents only have asymmetric partial observations of the system. In this …

Multi-agent reinforcement learning for nonzero-sum Markov games

W Mao - 2024 - ideals.illinois.edu
In recent years, multi-agent reinforcement learning (MARL) has shown remarkable
capabilities in addressing sequential decision-making problems that involve the strategic …