Reinforcement learning based recommender systems: A survey
Recommender systems (RSs) have become an inseparable part of our everyday lives. They
help us find our favorite items to purchase, our friends on social networks, and our favorite …
help us find our favorite items to purchase, our friends on social networks, and our favorite …
Integrated task and motion planning
The problem of planning for a robot that operates in environments containing a large
number of objects, taking actions to move itself through the world as well as to change the …
number of objects, taking actions to move itself through the world as well as to change the …
Large language models as commonsense knowledge for large-scale task planning
Large-scale task planning is a major challenge. Recent work exploits large language
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …
models (LLMs) directly as a policy and shows surprisingly interesting results. This paper …
Reasoning with language model is planning with world model
Large language models (LLMs) have shown remarkable reasoning capabilities, especially
when prompted to generate intermediate reasoning steps (eg, Chain-of-Thought, CoT) …
when prompted to generate intermediate reasoning steps (eg, Chain-of-Thought, CoT) …
Math-shepherd: Verify and reinforce llms step-by-step without human annotations
In this paper, we present an innovative process-oriented math process reward model called
Math-shepherd, which assigns a reward score to each step of math problem solutions. The …
Math-shepherd, which assigns a reward score to each step of math problem solutions. The …
Scaling llm test-time compute optimally can be more effective than scaling model parameters
Enabling LLMs to improve their outputs by using more test-time computation is a critical step
towards building generally self-improving agents that can operate on open-ended natural …
towards building generally self-improving agents that can operate on open-ended natural …
Mastering atari games with limited data
Reinforcement learning has achieved great success in many applications. However, sample
efficiency remains a key challenge, with prominent methods requiring millions (or even …
efficiency remains a key challenge, with prominent methods requiring millions (or even …
Alphazero-like tree-search can guide large language model decoding and training
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the reasoning capabilities of LLMs by using tree-search algorithms to guide multi-step …
the reasoning capabilities of LLMs by using tree-search algorithms to guide multi-step …
Multi-agent reinforcement learning: A selective overview of theories and algorithms
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …
has registered tremendous success in solving various sequential decision-making problems …
Mastering atari, go, chess and shogi by planning with a learned model
Constructing agents with planning capabilities has long been one of the main challenges in
the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge …
the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge …