LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning
Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …
possibilities for various applications in the field of human-machine interactions. However …
Tod3cap: Towards 3d dense captioning in outdoor scenes
Abstract 3D dense captioning stands as a cornerstone in achieving a comprehensive
understanding of 3D scenes through natural language. It has recently witnessed remarkable …
understanding of 3D scenes through natural language. It has recently witnessed remarkable …
Chat-scene: Bridging 3d scene and large language models with object identifiers
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
Bi-directional contextual attention for 3d dense captioning
Abstract 3D dense captioning is a task involving the localization of objects and the
generation of descriptions for each object in a 3D scene. Recent approaches have …
generation of descriptions for each object in a 3D scene. Recent approaches have …
Lightweight Model Pre-Training Via Language Guided Knowledge Distillation
This paper studies the problem of pre-training for small models, which is essential for many
mobile devices. Current state-of-the-art methods on this problem transfer the …
mobile devices. Current state-of-the-art methods on this problem transfer the …