Академия Google

MM Afsar, T Crump, B Far - ACM Computing Surveys, 2022 - dl.acm.org

Recommender systems (RSs) have become an inseparable part of our everyday lives. They
help us find our favorite items to purchase, our friends on social networks, and our favorite …

Сохранить Цитировать Цитируется: 542 Похожие статьи Все версии статьи (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Integrated task and motion planning

CR Garrett, R Chitnis, R Holladay, B Kim… - Annual review of …, 2021 - annualreviews.org

The problem of planning for a robot that operates in environments containing a large
number of objects, taking actions to move itself through the world as well as to change the …

Сохранить Цитировать Цитируется: 574 Похожие статьи Все версии статьи (8)

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Large language models as commonsense knowledge for large-scale task planning

Z Zhao, WS Lee, D Hsu - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Large-scale task planning is a major challenge. Recent work exploits large language
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …

Сохранить Цитировать Цитируется: 181 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reasoning with language model is planning with world model

S Hao, Y Gu, H Ma, JJ Hong, Z Wang, DZ Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have shown remarkable reasoning capabilities, especially
when prompted to generate intermediate reasoning steps (eg, Chain-of-Thought, CoT) …

Сохранить Цитировать Цитируется: 421 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Math-shepherd: Verify and reinforce llms step-by-step without human annotations

P Wang, L Li, Z Shao, R Xu, D Dai, Y Li… - Proceedings of the …, 2024 - aclanthology.org

In this paper, we present an innovative process-oriented math process reward model called
Math-shepherd, which assigns a reward score to each step of math problem solutions. The …

Сохранить Цитировать Цитируется: 122 Похожие статьи В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling llm test-time compute optimally can be more effective than scaling model parameters

C Snell, J Lee, K Xu, A Kumar - arxiv preprint arxiv:2408.03314, 2024 - arxiv.org

Enabling LLMs to improve their outputs by using more test-time computation is a critical step
towards building generally self-improving agents that can operate on open-ended natural …

Сохранить Цитировать Цитируется: 154 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mastering atari games with limited data

W Ye, S Liu, T Kurutach, P Abbeel… - Advances in neural …, 2021 - proceedings.neurips.cc

Reinforcement learning has achieved great success in many applications. However, sample
efficiency remains a key challenge, with prominent methods requiring millions (or even …

Сохранить Цитировать Цитируется: 259 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Alphazero-like tree-search can guide large language model decoding and training

X Feng, Z Wan, M Wen, SM McAleer, Y Wen… - arxiv preprint arxiv …, 2023 - arxiv.org

Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the reasoning capabilities of LLMs by using tree-search algorithms to guide multi-step …

Сохранить Цитировать Цитируется: 75 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Сохранить Цитировать Цитируется: 1711 Похожие статьи Все версии статьи (8)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mastering atari, go, chess and shogi by planning with a learned model

J Schrittwieser, I Antonoglou, T Hubert, K Simonyan… - Nature, 2020 - nature.com

Constructing agents with planning capabilities has long been one of the main challenges in
the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge …

Сохранить Цитировать Цитируется: 2733 Похожие статьи Все версии статьи (15)

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Bandit based monte-carlo planning

Reinforcement learning based recommender systems: A survey

Integrated task and motion planning

Large language models as commonsense knowledge for large-scale task planning

Reasoning with language model is planning with world model

Math-shepherd: Verify and reinforce llms step-by-step without human annotations

Scaling llm test-time compute optimally can be more effective than scaling model parameters

Mastering atari games with limited data

Alphazero-like tree-search can guide large language model decoding and training

Multi-agent reinforcement learning: A selective overview of theories and algorithms

Mastering atari, go, chess and shogi by planning with a learned model