Video pretraining (vpt): Learning to act by watching unlabeled online videos

B Baker, I Akkaya, P Zhokov… - Advances in …, 2022 - proceedings.neurips.cc
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …

Auto mc-reward: Automated dense reward design with large language models for minecraft

H Li, X Yang, Z Wang, X Zhu, J Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
Many reinforcement learning environments (eg Minecraft) provide only sparse rewards that
indicate task completion or failure with binary values. The challenge in exploration efficiency …

[PDF][PDF] Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks

PKU BAAI - arxiv preprint arxiv:2303.16563, 2023 - researchgate.net
We study building a multi-task agent in Minecraft. Without human demonstrations, solving
long-horizon tasks in this open-ended environment with reinforcement learning (RL) is …

Omni: Open-endedness via models of human notions of interestingness

J Zhang, J Lehman, K Stanley, J Clune - arxiv preprint arxiv:2306.01711, 2023 - arxiv.org
Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast
environment search space, but there are thus infinitely many possible tasks. Even after …

Skill reinforcement learning and planning for open-world long-horizon tasks

H Yuan, C Zhang, H Wang, F **e, P Cai… - arxiv preprint arxiv …, 2023 - arxiv.org
We study building multi-task agents in open-world environments. Without human
demonstrations, learning to accomplish long-horizon tasks in a large open-world …

Learning Curricula in Open-Ended Worlds

M Jiang - arxiv preprint arxiv:2312.03126, 2023 - arxiv.org
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential
decision-making agents. As collecting real-world interactions can entail additional costs and …

Hieros: Hierarchical Imagination on Structured State Space Sequence World Models

P Mattes, R Schlosser, R Herbrich - arxiv preprint arxiv:2310.05167, 2023 - arxiv.org
One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is
sample efficiency. Many approaches learn a world model in order to train an agent entirely …