Tool learning with foundation models

Y Qin, S Hu, Y Lin, W Chen, N Ding, G Cui… - ACM Computing …, 2024 - dl.acm.org
Humans possess an extraordinary ability to create and utilize tools. With the advent of
foundation models, artificial intelligence systems have the potential to be equally adept in …

A survey of embodied ai: From simulators to research tasks

J Duan, S Yu, HL Tan, H Zhu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …

Diffusion-based generation, optimization, and planning in 3d scenes

S Huang, Z Wang, P Li, B Jia, T Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce SceneDiffuser, a conditional generative model for 3D scene understanding.
SceneDiffuser provides a unified model for solving scene-conditioned generation …

[PDF][PDF] Drive like a human: Rethinking autonomous driving with large language models

D Fu, X Li, L Wen, M Dou, P Cai… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper, we explore the potential of using a large language model (LLM) to understand
the driving environment in a human-like manner and analyze its ability to reason, interpret …

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

Poni: Potential functions for objectgoal navigation with interaction-free learning

SK Ramakrishnan, DS Chaplot… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art approaches to ObjectGoal navigation (ObjectNav) rely on reinforcement
learning and typically require significant computational resources and time for learning. We …

Habitat-web: Learning embodied object-search strategies from human demonstrations at scale

R Ramrakhya, E Undersander… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a large-scale study of imitating human demonstrations on tasks that require a
virtual robot to search for objects in new environments-(1) ObjectGoal Navigation (eg'find & …

Bird's-Eye-View Scene Graph for Vision-Language Navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Pirlnav: Pretraining with imitation and rl finetuning for objectnav

R Ramrakhya, D Batra, E Wijmans… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We study ObjectGoal Navigation--where a virtual robot situated in a new
environment is asked to navigate to an object. Prior work has shown that imitation learning …

Occupancy anticipation for efficient exploration and navigation

SK Ramakrishnan, Z Al-Halah, K Grauman - Computer Vision–ECCV 2020 …, 2020 - Springer
State-of-the-art navigation methods leverage a spatial memory to generalize to new
environments, but their occupancy maps are limited to capturing the geometric structures …