A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Exploration in deep reinforcement learning: A survey
This paper reviews exploration techniques in deep reinforcement learning. Exploration
techniques are of primary importance when solving sparse reward problems. In sparse …
techniques are of primary importance when solving sparse reward problems. In sparse …
Emergent tool use from multi-agent autocurricula
Through multi-agent competition, the simple objective of hide-and-seek, and standard
reinforcement learning algorithms at scale, we find that agents create a self-supervised …
reinforcement learning algorithms at scale, we find that agents create a self-supervised …
Towards continual reinforcement learning: A review and perspectives
In this article, we aim to provide a literature review of different formulations and approaches
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …
to continual reinforcement learning (RL), also known as lifelong or non-stationary RL. We …
Planning to explore via self-supervised world models
Reinforcement learning allows solving complex tasks, however, the learning tends to be task-
specific and the sample efficiency remains a challenge. We present Plan2Explore, a self …
specific and the sample efficiency remains a challenge. We present Plan2Explore, a self …
Exploration by random network distillation
We introduce an exploration bonus for deep reinforcement learning methods that is easy to
implement and adds minimal overhead to the computation performed. The bonus is the error …
implement and adds minimal overhead to the computation performed. The bonus is the error …
Model-based reinforcement learning: A survey
Sequential decision making, commonly formalized as Markov Decision Process (MDP)
optimization, is an important challenge in artificial intelligence. Two key approaches to this …
optimization, is an important challenge in artificial intelligence. Two key approaches to this …
Large-scale study of curiosity-driven learning
Reinforcement learning algorithms rely on carefully engineering environment rewards that
are extrinsic to the agent. However, annotating each environment with hand-designed …
are extrinsic to the agent. However, annotating each environment with hand-designed …
Aps: Active pretraining with successor features
We introduce a new unsupervised pretraining objective for reinforcement learning. During
the unsupervised reward-free pretraining phase, the agent maximizes mutual information …
the unsupervised reward-free pretraining phase, the agent maximizes mutual information …
Self-supervised exploration via disagreement
Efficient exploration is a long-standing problem in sensorimotor learning. Major advances
have been demonstrated in noise-free, non-stochastic domains such as video games and …
have been demonstrated in noise-free, non-stochastic domains such as video games and …