Lightzero: A unified benchmark for monte carlo tree search in general sequential decision scenarios
Building agents based on tree-search planning capabilities with learned models has
achieved remarkable success in classic decision-making problems, such as Go and Atari …
achieved remarkable success in classic decision-making problems, such as Go and Atari …
Pgx: Hardware-accelerated parallel game simulators for reinforcement learning
We propose Pgx, a suite of board game reinforcement learning (RL) environments written in
JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and …
JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and …
Revisiting simple regret: Fast rates for returning a good arm
Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …
multi-armed bandits yet is less popular than the probability of missing the best arm or an …
[HTML][HTML] Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge
Process synthesis in chemical engineering is a complex planning problem due to vast
search spaces, continuous parameters and the need for generalization. Deep reinforcement …
search spaces, continuous parameters and the need for generalization. Deep reinforcement …
Learning to find proofs and theorems by learning to refine search strategies: The case of loop invariant synthesis
J Laurent, A Platzer - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose a new approach to automated theorem proving where an AlphaZero-style agent
is self-training to refine a generic high-level expert strategy expressed as a nondeterministic …
is self-training to refine a generic high-level expert strategy expressed as a nondeterministic …
Accelerating monte carlo tree search with probability tree state abstraction
Abstract Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have
achieved superhuman performance in many challenging tasks. However, the computational …
achieved superhuman performance in many challenging tasks. However, the computational …
Thinker: Learning to plan and act
We propose the Thinker algorithm, a novel approach that enables reinforcement learning
agents to autonomously interact with and utilize a learned world model. The Thinker …
agents to autonomously interact with and utilize a learned world model. The Thinker …
Opponent Modeling with In-context Search
Opponent modeling is a longstanding research topic aimed at enhancing decision-making
by modeling information about opponents in multi-agent environments. However, existing …
by modeling information about opponents in multi-agent environments. However, existing …
[HTML][HTML] Deep controlled learning for inventory control
Abstract The application of Deep Reinforcement Learning (DRL) to inventory management
is an emerging field. However, traditional DRL algorithms, originally developed for diverse …
is an emerging field. However, traditional DRL algorithms, originally developed for diverse …
A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning
Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by
stochasticity in training and sensitivity of agent performance to the behavior of other agents …
stochasticity in training and sensitivity of agent performance to the behavior of other agents …