Google Académico

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net

Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

Guardar Citar Citado por 12 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Vlm-grounder: A vlm agent for zero-shot 3d visual grounding

R Xu, Z Huang, T Wang, Y Chen, J Pang… - ar** robust autonomous …

Guardar Citar Citado por 1 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

R Lyu, T Wang, J Lin, S Yang, X Mao, Y Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D
perception attracts more attention due to its connectivity to the physical world and makes …

Guardar Citar Citado por 2 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

PerLA: Perceptive 3D language assistant

G Mei, W Lin, L Riz, Y Wu, F Poiesi, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org

Enabling Large Language Models (LLMs) to understand the 3D physical world is an
emerging yet challenging research direction. Current strategies for processing point clouds …

Guardar Citar Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

J Deng, T He, L Jiang, T Wang, F Dayoub… - arxiv preprint arxiv …, 2025 - arxiv.org

Current 3D Large Multimodal Models (3D LMMs) have shown tremendous potential in 3D-
vision-based dialogue and reasoning. However, how to further enhance 3D LMMs to …

Guardar Citar Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

T Zemskova, D Yudin - arxiv preprint arxiv:2412.18450, 2024 - arxiv.org

A 3D scene graph represents a compact scene model, storing information about the objects
and the semantic relationships between them, making its use promising for robotic tasks …

Guardar Citar Artículos relacionados Las 2 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Grounded 3D-LLM with Referent Tokens

Chat-scene: Bridging 3d scene and large language models with object identifiers

Vlm-grounder: A vlm agent for zero-shot 3d visual grounding

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

PerLA: Perceptive 3D language assistant

3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding