Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Embodied navigation with multi-modal information: A survey from tasks to methodology
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …
environment. A key problem in this field is embodied navigation which understands multi …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
Multi3drefer: Grounding text description to multiple 3d objects
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
Ok-robot: What really matters in integrating open-knowledge models for robotics
Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …
robotics. We now have vision models capable of recognizing objects based on language …
Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation
Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor
control and instruction comprehension through end-to-end learning processes. However …
control and instruction comprehension through end-to-end learning processes. However …
Vision-and-language navigation today and tomorrow: A survey in the era of foundation models
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …
and many approaches have emerged to advance their development. The remarkable …
Goat-bench: A benchmark for multi-modal lifelong navigation
The Embodied AI community has recently made significant strides in visual navigation tasks
exploring targets from 3D coordinates objects language description and images. However …
exploring targets from 3D coordinates objects language description and images. However …
Adaptive mobile manipulation for articulated objects in the open world
Deploying robots in open-ended unstructured environments such as homes has been a long-
standing research problem. However, robots are often studied only in closed-off lab settings …
standing research problem. However, robots are often studied only in closed-off lab settings …
Poliformer: Scaling on-policy rl with transformers results in masterful navigators
We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained
end-to-end with reinforcement learning at scale that generalizes to the real-world without …
end-to-end with reinforcement learning at scale that generalizes to the real-world without …