Maven: Multi-agent variational exploration
Centralised training with decentralised execution is an important setting for cooperative
deep multi-agent reinforcement learning due to communication constraints during execution …
deep multi-agent reinforcement learning due to communication constraints during execution …
Constrained variational policy optimization for safe reinforcement learning
Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before
deploying them to safety-critical applications. Previous primal-dual style approaches suffer …
deploying them to safety-critical applications. Previous primal-dual style approaches suffer …
Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities
There are two main ways to discover or design small drug molecules. The first involves fine-
tuning existing molecules or commercially successful drugs through quantitative structure …
tuning existing molecules or commercially successful drugs through quantitative structure …
Deep active inference agents using Monte-Carlo methods
Active inference is a Bayesian framework for understanding biological intelligence. The
underlying theory brings together perception and action under one single imperative …
underlying theory brings together perception and action under one single imperative …
Deep active inference as variational policy gradients
B Millidge - Journal of Mathematical Psychology, 2020 - Elsevier
Active Inference is a theory arising from theoretical neuroscience which casts action and
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …
planning as Bayesian inference problems to be solved by minimizing a single quantity—the …
Leverage the average: an analysis of kl regularization in reinforcement learning
Abstract Recent Reinforcement Learning (RL) algorithms making use of Kullback-Leibler
(KL) regularization as a core component have shown outstanding performance. Yet, only …
(KL) regularization as a core component have shown outstanding performance. Yet, only …
Posterior sampling with delayed feedback for reinforcement learning with linear function approximation
Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …
function approximation to alleviate the sample complexity hurdle for better performance …
Adversarial Binaries: AI-guided Instrumentation Methods for Malware Detection Evasion
L Koch, E Begoli - ACM Computing Surveys, 2025 - dl.acm.org
Adversarial binaries are executable files that have been altered without loss of function by
an AI agent in order to deceive malware detection systems. Progress in this emergent vein of …
an AI agent in order to deceive malware detection systems. Progress in this emergent vein of …
Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming
Information sharing is key in building team cognition and enables coordination and
cooperation. High-performing human teams also benefit from acting strategically with …
cooperation. High-performing human teams also benefit from acting strategically with …
Coherent soft imitation learning
Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …