Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
A survey on integration of large language models with intelligent robots
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …
of robotics, enabling robots to communicate, understand, and reason with human-like …
Lerf: Language embedded radiance fields
Humans describe the physical world using natural language to refer to specific 3D locations
based on a vast range of properties: visual appearance, semantics, abstract associations, or …
based on a vast range of properties: visual appearance, semantics, abstract associations, or …
Gaussian grou**: Segment and edit anything in 3d scenes
Abstract The recent Gaussian Splatting achieves high-quality and real-time novel-view
synthesis of the 3D scenes. However, it is solely concentrated on the appearance and …
synthesis of the 3D scenes. However, it is solely concentrated on the appearance and …
Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …
that is semantically rich, yet compact and efficient for task-driven perception and planning …
Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
Segment anything in 3d with nerfs
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …
foundation model which is capable to segment anything in 2D images. This paper aims to …
Physically grounded vision-language models for robotic manipulation
Recent advances in vision-language models (VLMs) have led to improved performance on
tasks such as visual question answering and image captioning. Consequently, these models …
tasks such as visual question answering and image captioning. Consequently, these models …
Openshape: Scaling up 3d shape representation towards open-world understanding
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …