Tool learning with foundation models

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

Deep reinforcement learning for robotics: A survey of real-world successes

C Tang, B Abbatematteo, J Hu… - Annual Review of …, 2024 - annualreviews.org
Reinforcement learning (RL), particularly its combination with deep neural networks,
referred to as deep RL (DRL), has shown tremendous promise across a wide range of …

Habitat 2.0: Training home assistants to rearrange their habitat

A Szot, A Clegg, E Undersander… - Advances in neural …, 2021 - proceedings.neurips.cc
Abstract We introduce Habitat 2.0 (H2. 0), a simulation platform for training virtual robots in
interactive 3D environments and complex physics-enabled scenarios. We make …

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

ViNT: A foundation model for visual navigation

D Shah, A Sridhar, N Dashora, K Stachowicz… - arxiv preprint arxiv …, 2023 - arxiv.org
General-purpose pre-trained models (" foundation models") have enabled practitioners to
produce generalizable solutions for individual machine learning problems with datasets that …

Nomad: Goal masked diffusion policies for navigation and exploration

A Sridhar, D Shah, C Glossop… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Robotic learning for navigation in unfamiliar environments needs to provide policies for both
task-oriented navigation (ie, reaching a goal that the robot has located), and task-agnostic …

Simple but effective: Clip embeddings for embodied ai

A Khandelwal, L Weihs, R Mottaghi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial
for a range of visual tasks from classification and detection to captioning and image …

A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Parallel learning: Overview and perspective for computational learning across Syn2Real and Sim2Real

Q Miao, Y Lv, M Huang, X Wang… - IEEE/CAA Journal of …, 2023 - ieeexplore.ieee.org
The virtual-to-real paradigm, ie, training models on virtual data and then applying them to
solve real-world problems, has attracted more and more attention from various domains by …

Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation

M Khanna, Y Mao, H Jiang, S Haresh… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We contribute the Habitat Synthetic Scene Dataset a dataset of 211 high-quality 3D
scenes and use it to test navigation agent generalization to realistic 3D environments. Our …