Video pretraining (vpt): Learning to act by watching unlabeled online videos
Pretraining on noisy, internet-scale datasets has been heavily studied as a technique for
training models with broad, general capabilities for text, images, and other modalities …
training models with broad, general capabilities for text, images, and other modalities …
Auto mc-reward: Automated dense reward design with large language models for minecraft
Many reinforcement learning environments (eg Minecraft) provide only sparse rewards that
indicate task completion or failure with binary values. The challenge in exploration efficiency …
indicate task completion or failure with binary values. The challenge in exploration efficiency …
[PDF][PDF] Plan4mc: Skill reinforcement learning and planning for open-world minecraft tasks
PKU BAAI - arxiv preprint arxiv:2303.16563, 2023 - researchgate.net
We study building a multi-task agent in Minecraft. Without human demonstrations, solving
long-horizon tasks in this open-ended environment with reinforcement learning (RL) is …
long-horizon tasks in this open-ended environment with reinforcement learning (RL) is …
Omni: Open-endedness via models of human notions of interestingness
Open-ended algorithms aim to learn new, interesting behaviors forever. That requires a vast
environment search space, but there are thus infinitely many possible tasks. Even after …
environment search space, but there are thus infinitely many possible tasks. Even after …
Skill reinforcement learning and planning for open-world long-horizon tasks
We study building multi-task agents in open-world environments. Without human
demonstrations, learning to accomplish long-horizon tasks in a large open-world …
demonstrations, learning to accomplish long-horizon tasks in a large open-world …
Learning Curricula in Open-Ended Worlds
M Jiang - arxiv preprint arxiv:2312.03126, 2023 - arxiv.org
Deep reinforcement learning (RL) provides powerful methods for training optimal sequential
decision-making agents. As collecting real-world interactions can entail additional costs and …
decision-making agents. As collecting real-world interactions can entail additional costs and …
Hieros: Hierarchical Imagination on Structured State Space Sequence World Models
One of the biggest challenges to modern deep reinforcement learning (DRL) algorithms is
sample efficiency. Many approaches learn a world model in order to train an agent entirely …
sample efficiency. Many approaches learn a world model in order to train an agent entirely …