A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

On transforming reinforcement learning with transformers: The development trajectory

S Hu, L Shen, Y Zhang, Y Chen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Transformers, originally devised for natural language processing (NLP), have also produced
significant successes in computer vision (CV). Due to their strong expression power …

Rt-1: Robotics transformer for real-world control at scale

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …

Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions

Y Chebotar, Q Vuong, K Hausman… - … on Robot Learning, 2023 - proceedings.mlr.press
In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

History aware multimodal transformer for vision-and-language navigation

S Chen, PL Guhur, C Schmid… - Advances in neural …, 2021 - proceedings.neurips.cc
Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …

Think global, act local: Dual-scale graph transformer for vision-and-language navigation

S Chen, PL Guhur, M Tapaswi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …

Spatio-temporal graph transformer networks for pedestrian trajectory prediction

C Yu, X Ma, J Ren, H Zhao, S Yi - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
Understanding crowd motion dynamics is critical to real-world applications, eg, surveillance
systems and autonomous driving. This is challenging because it requires effectively …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Poni: Potential functions for objectgoal navigation with interaction-free learning

SK Ramakrishnan, DS Chaplot… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art approaches to ObjectGoal navigation (ObjectNav) rely on reinforcement
learning and typically require significant computational resources and time for learning. We …