Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
A survey on integration of large language models with intelligent robots
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …
of robotics, enabling robots to communicate, understand, and reason with human-like …
Lerf: Language embedded radiance fields
Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …
based on a vast range of properties: visual appearance, semantics, abstract associations, or …
Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …
that is semantically rich, yet compact and efficient for task-driven perception and planning …
Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
Segment anything in 3d with nerfs
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …
foundation model which is capable to segment anything in 2D images. This paper aims to …
Openshape: Scaling up 3d shape representation towards open-world understanding
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
Gnfactor: Multi-task real robot learning with generalizable neural feature fields
It is a long-standing problem in robotics to develop agents capable of executing diverse
manipulation tasks from visual observations in unstructured real-world environments. To …
manipulation tasks from visual observations in unstructured real-world environments. To …
Langsplat: 3d language gaussian splatting
Humans live in a 3D world and commonly use natural language to interact with a 3D scene.
Modeling a 3D language field to support open-ended language queries in 3D has gained …
Modeling a 3D language field to support open-ended language queries in 3D has gained …
Toward general-purpose robots via foundation models: A survey and meta-analysis
Building general-purpose robots that operate seamlessly in any environment, with any
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …