Chat-scene: Bridging 3d scene and large language models with object identifiers

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

R Lyu, T Wang, J Lin, S Yang, X Mao, Y Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
With the emergence of LLMs and their integration with other data modalities, multi-modal 3D
perception attracts more attention due to its connectivity to the physical world and makes …

PerLA: Perceptive 3D language assistant

G Mei, W Lin, L Riz, Y Wu, F Poiesi, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Enabling Large Language Models (LLMs) to understand the 3D physical world is an
emerging yet challenging research direction. Current strategies for processing point clouds …

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

J Deng, T He, L Jiang, T Wang, F Dayoub… - arxiv preprint arxiv …, 2025 - arxiv.org
Current 3D Large Multimodal Models (3D LMMs) have shown tremendous potential in 3D-
vision-based dialogue and reasoning. However, how to further enhance 3D LMMs to …

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

T Zemskova, D Yudin - arxiv preprint arxiv:2412.18450, 2024 - arxiv.org
A 3D scene graph represents a compact scene model, storing information about the objects
and the semantic relationships between them, making its use promising for robotic tasks …