Training language models to follow instructions with human feedback

L Ouyang, J Wu, X Jiang, D Almeida… - Advances in neural …, 2022 - proceedings.neurips.cc
Making language models bigger does not inherently make them better at following a user's
intent. For example, large language models can generate outputs that are untruthful, toxic, or …

Perceiver io: A general architecture for structured inputs & outputs

A Jaegle, S Borgeaud, JB Alayrac, C Doersch… - arxiv preprint arxiv …, 2021 - arxiv.org
A central goal of machine learning is the development of systems that can solve many
problems in as many data domains as possible. Current architectures, however, cannot be …

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc
We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

Frozen pretrained transformers as universal computation engines

K Lu, A Grover, P Abbeel, I Mordatch - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We investigate the capability of a transformer pretrained on natural language to generalize
to other modalities with minimal finetuning--in particular, without finetuning of the self …

Language-conditioned learning for robotic manipulation: A survey

H Zhou, X Yao, Y Meng, S Sun, Z Bing, K Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
Language-conditioned robotic manipulation represents a cutting-edge area of research,
enabling seamless communication and cooperation between humans and robotic agents …

Collaborating with humans without human data

DJ Strouse, K McKee, M Botvinick… - Advances in …, 2021 - proceedings.neurips.cc
Collaborating with humans requires rapidly adapting to their individual strengths,
weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement …

Natural language instructions induce compositional generalization in networks of neurons

R Riveland, A Pouget - Nature Neuroscience, 2024 - nature.com
A fundamental human cognitive feat is to interpret linguistic instructions in order to perform
novel tasks without explicit task experience. Yet, the neural computations that might be used …

Vision-language models as success detectors

Y Du, K Konyushkova, M Denil, A Raju… - arxiv preprint arxiv …, 2023 - arxiv.org
Detecting successful behaviour is crucial for training intelligent agents. As such,
generalisable reward models are a prerequisite for agents that can learn to generalise their …

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

R Ramrakhya, E Undersander… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a large-scale study of imitating human demonstrations on tasks that require a
virtual robot to search for objects in new environments-(1) ObjectGoal Navigation (eg'find & …

Pirlnav: Pretraining with imitation and rl finetuning for objectnav

R Ramrakhya, D Batra, E Wijmans… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We study ObjectGoal Navigation--where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that imitation learning …