Google 학술 검색

Y Niu, Y Pu, Z Yang, X Li, T Zhou… - Advances in …, 2024 - proceedings.neurips.cc

Building agents based on tree-search planning capabilities with learned models has
achieved remarkable success in classic decision-making problems, such as Go and Atari …

저장 인용 15회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pgx: Hardware-accelerated parallel game simulators for reinforcement learning

S Koyamada, S Okano, S Nishimori… - Advances in …, 2024 - proceedings.neurips.cc

We propose Pgx, a suite of board game reinforcement learning (RL) environments written in
JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and …

저장 인용 26회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Revisiting simple regret: Fast rates for returning a good arm

Y Zhao, C Stephens, C Szepesvári… - … on Machine Learning, 2023 - proceedings.mlr.press

Simple regret is a natural and parameter-free performance criterion for pure exploration in
multi-armed bandits yet is less popular than the probability of missing the best arm or an …

저장 인용 15회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge

Q Göttl, J Pirnay, J Burger, DG Grimm - Computers & Chemical …, 2025 - Elsevier

Process synthesis in chemical engineering is a complex planning problem due to vast
search spaces, continuous parameters and the need for generalization. Deep reinforcement …

저장 인용 2회 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning to find proofs and theorems by learning to refine search strategies: The case of loop invariant synthesis

J Laurent, A Platzer - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose a new approach to automated theorem proving where an AlphaZero-style agent
is self-training to refine a generic high-level expert strategy expressed as a nondeterministic …

저장 인용 12회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Accelerating monte carlo tree search with probability tree state abstraction

Y Fu, M Sun, B Nie, Y Gao - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Monte Carlo Tree Search (MCTS) algorithms such as AlphaGo and MuZero have
achieved superhuman performance in many challenging tasks. However, the computational …

저장 인용 3회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Thinker: Learning to plan and act

S Chung, I Anokhin, D Krueger - Advances in Neural …, 2023 - proceedings.neurips.cc

We propose the Thinker algorithm, a novel approach that enables reinforcement learning
agents to autonomously interact with and utilize a learned world model. The Thinker …

저장 인용 3회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Opponent Modeling with In-context Search

Y **g, B Liu, K Li, Y Zang, H Fu, Q Fu… - Advances in …, 2025 - proceedings.neurips.cc

Opponent modeling is a longstanding research topic aimed at enhancing decision-making
by modeling information about opponents in multi-agent environments. However, existing …

저장 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Deep controlled learning for inventory control

T Temizöz, C Imdahl, R Dijkman… - European Journal of …, 2025 - Elsevier

Abstract The application of Deep Reinforcement Learning (DRL) to inventory management
is an emerging field. However, traditional DRL algorithms, originally developed for diverse …

저장 인용 22회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning

Z Li, MP Wellman - arxiv preprint arxiv:2405.00243, 2024 - arxiv.org

Evaluating deep multiagent reinforcement learning (MARL) algorithms is complicated by
stochasticity in training and sensitivity of agent performance to the behavior of other agents …

저장 인용 3회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Policy improvement by planning with Gumbel

Lightzero: A unified benchmark for monte carlo tree search in general sequential decision scenarios

Pgx: Hardware-accelerated parallel game simulators for reinforcement learning

Revisiting simple regret: Fast rates for returning a good arm

[HTML][HTML] Deep reinforcement learning enables conceptual design of processes for separating azeotropic mixtures without prior knowledge

Learning to find proofs and theorems by learning to refine search strategies: The case of loop invariant synthesis

Accelerating monte carlo tree search with probability tree state abstraction

Thinker: Learning to plan and act

Opponent Modeling with In-context Search

[HTML][HTML] Deep controlled learning for inventory control

A Meta-Game Evaluation Framework for Deep Multiagent Reinforcement Learning