Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Gaussianavatar: Towards realistic human avatar modeling from a single video via animatable 3d gaussians

L Hu, H Zhang, Y Zhang, B Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present GaussianAvatar an efficient approach to creating realistic human avatars with
dynamic 3D appearances from a single video. We start by introducing animatable 3D …

Segment anything in 3d with nerfs

J Cen, Z Zhou, J Fang, W Shen, L **e… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2024 - proceedings.neurips.cc
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Ovir-3d: Open-vocabulary 3d instance retrieval without training on 3d data

S Lu, H Chang, EP **g, A Boularias… - Conference on Robot …, 2023 - proceedings.mlr.press
This work presents OVIR-3D, a straightforward yet effective method for open-vocabulary 3D
object instance retrieval without using any 3D data for training. Given a language query, the …

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

J Yang, R Ding, W Deng, Z Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …

Bridging the domain gap: Self-supervised 3d scene understanding with foundation models

Z Chen, L **g, Y Li, B Li - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Foundation models have achieved remarkable results in 2D and language tasks like image
segmentation, object detection, and visual-language understanding. However, their …