Reinforcement learning based recommender systems: A survey

MM Afsar, T Crump, B Far - ACM Computing Surveys, 2022 - dl.acm.org
Recommender systems (RSs) have become an inseparable part of our everyday lives. They
help us find our favorite items to purchase, our friends on social networks, and our favorite …

Integrated task and motion planning

CR Garrett, R Chitnis, R Holladay, B Kim… - Annual review of …, 2021 - annualreviews.org
The problem of planning for a robot that operates in environments containing a large
number of objects, taking actions to move itself through the world as well as to change the …

Large language models as commonsense knowledge for large-scale task planning

Z Zhao, WS Lee, D Hsu - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Large-scale task planning is a major challenge. Recent work exploits large language
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …

Reasoning with language model is planning with world model

S Hao, Y Gu, H Ma, JJ Hong, Z Wang, DZ Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have shown remarkable reasoning capabilities, especially
when prompted to generate intermediate reasoning steps (eg, Chain-of-Thought, CoT) …

Math-shepherd: Verify and reinforce llms step-by-step without human annotations

P Wang, L Li, Z Shao, R Xu, D Dai, Y Li… - Proceedings of the …, 2024 - aclanthology.org
In this paper, we present an innovative process-oriented math process reward model called
Math-shepherd, which assigns a reward score to each step of math problem solutions. The …

Scaling llm test-time compute optimally can be more effective than scaling model parameters

C Snell, J Lee, K Xu, A Kumar - arxiv preprint arxiv:2408.03314, 2024 - arxiv.org
Enabling LLMs to improve their outputs by using more test-time computation is a critical step
towards building generally self-improving agents that can operate on open-ended natural …

Mastering atari games with limited data

W Ye, S Liu, T Kurutach, P Abbeel… - Advances in neural …, 2021 - proceedings.neurips.cc
Reinforcement learning has achieved great success in many applications. However, sample
efficiency remains a key challenge, with prominent methods requiring millions (or even …

Alphazero-like tree-search can guide large language model decoding and training

X Feng, Z Wan, M Wen, SM McAleer, Y Wen… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the reasoning capabilities of LLMs by using tree-search algorithms to guide multi-step …

Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Mastering atari, go, chess and shogi by planning with a learned model

J Schrittwieser, I Antonoglou, T Hubert, K Simonyan… - Nature, 2020 - nature.com
Constructing agents with planning capabilities has long been one of the main challenges in
the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge …