Lightzero: A unified benchmark for monte carlo tree search in general sequential decision scenarios

Y Niu, Y Pu, Z Yang, X Li, T Zhou… - Advances in …, 2024 - proceedings.neurips.cc
Building agents based on tree-search planning capabilities with learned models has
achieved remarkable success in classic decision-making problems, such as Go and Atari …

Pgx: Hardware-accelerated parallel game simulators for reinforcement learning

S Koyamada, S Okano, S Nishimori… - Advances in …, 2024 - proceedings.neurips.cc
We propose Pgx, a suite of board game reinforcement learning (RL) environments written in
JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and …

Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press
Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

[HTML][HTML] Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge

Q Göttl, J Pirnay, J Burger, DG Grimm - Computers & Chemical …, 2025 - Elsevier
Process synthesis in chemical engineering is a complex planning problem due to vast
search spaces, continuous parameters and the need for generalization. Deep reinforcement …

Learning to find proofs and theorems by learning to refine search strategies: The case of loop invariant synthesis

J Laurent, A Platzer - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose a new approach to automated theorem proving where an AlphaZero-style agent
is self-training to refine a generic high-level expert strategy expressed as a nondeterministic …

Accelerating monte carlo tree search with probability tree state abstraction

Y Fu, M Sun, B Nie, Y Gao - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have
achieved superhuman performance in many challenging tasks. However, the computational …

Thinker: Learning to plan and act

S Chung, I Anokhin, D Krueger - Advances in Neural …, 2023 - proceedings.neurips.cc
We propose the Thinker algorithm, a novel approach that enables reinforcement learning
agents to autonomously interact with and utilize a learned world model. The Thinker …

Opponent Modeling with In-context Search

Y **g, B Liu, K Li, Y Zang, H Fu, Q Fu… - Advances in …, 2025 - proceedings.neurips.cc
Opponent modeling is a longstanding research topic aimed at enhancing decision-making
by modeling information about opponents in multi-agent environments. However, existing …

[HTML][HTML] Deep controlled learning for inventory control

T Temizöz, C Imdahl, R Dijkman… - European Journal of …, 2025 - Elsevier
Abstract The application of Deep Reinforcement Learning (DRL) to inventory management
is an emerging field. However, traditional DRL algorithms, originally developed for diverse …

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Z Li, MP Wellman - arxiv preprint arxiv:2405.00243, 2024 - arxiv.org
Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by
stochasticity in training and sensitivity of agent performance to the behavior of other agents …