A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Segment anything in 3d with nerfs

J Cen, Z Zhou, J Fang, W Shen, L **e… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Recently, the Segment Anything Model (SAM) emerged as a powerful vision
foundation model which is capable to segment anything in 2D images. This paper aims to …

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Ll3da: Visual interactive instruction tuning for omni-3d understanding reasoning and planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Openshape: Scaling up 3d shape representation towards open-world understanding

M Liu, R Shi, K Kuang, Y Zhu, X Li… - Advances in neural …, 2023 - proceedings.neurips.cc
We introduce OpenShape, a method for learning multi-modal joint representations of text,
image, and point clouds. We adopt the commonly used multi-modal contrastive learning …

Openmask3d: Open-vocabulary 3d instance segmentation

A Takmaz, E Fedele, RW Sumner, M Pollefeys… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce the task of open-vocabulary 3D instance segmentation. Current approaches
for 3D instance segmentation can typically only recognize object categories from a pre …

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Embodiedscan: A holistic multi-modal 3d perception suite towards embodied ai

T Wang, X Mao, C Zhu, R Xu, R Lyu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the realm of computer vision and robotics embodied agents are expected to explore their
environment and carry out human instructions. This necessitates the ability to fully …

Language embedded 3d gaussians for open-vocabulary scene understanding

JC Shi, M Wang, HB Duan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Open-vocabulary querying in 3D space is challenging but essential for scene understanding
tasks such as object localization and segmentation. Language-embedded scene …