Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …
that is semantically rich, yet compact and efficient for task-driven perception and planning …
Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
dynamic 3D appearances from a single video. We start by introducing animatable 3D …
Segment anything in 3d with nerfs
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …
foundation model which is capable to segment anything in 2D images. This paper aims to …
Openshape: Scaling up 3d shape representation towards open-world understanding
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …
Dreamllm: Synergistic multimodal comprehension and creation
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
Towards open vocabulary learning: A survey
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …
advancements in various core tasks like segmentation, tracking, and detection. However …
Ovir-3d: Open-vocabulary 3d instance retrieval without training on 3d data
This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D
object instance retrieval without using any 3D data for training. Given a language query, the …
object instance retrieval without using any 3D data for training. Given a language query, the …
Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding
We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …
Bridging the domain gap: Self-supervised 3d scene understanding with foundation models
Foundation models have achieved remarkable results in 2D and language tasks like image
segmentation, object detection, and visual-language understanding. However, their …
segmentation, object detection, and visual-language understanding. However, their …