LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Tod3cap: Towards 3d dense captioning in outdoor scenes

B **, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu… - … on Computer Vision, 2024 - Springer
Abstract 3D dense captioning stands as a cornerstone in achieving a comprehensive
understanding of 3D scenes through natural language. It has recently witnessed remarkable …

Chat-scene: Bridging 3d scene and large language models with object identifiers

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

Bi-directional contextual attention for 3d dense captioning

M Kim, HS Lim, S Lee, B Kim, G Kim - European Conference on Computer …, 2024 - Springer
Abstract 3D dense captioning is a task involving the localization of objects and the
generation of descriptions for each object in a 3D scene. Recent approaches have …

Lightweight Model Pre-Training Via Language Guided Knowledge Distillation

M Li, L Zhang, M Zhu, Z Huang, G Yu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
This paper studies the problem of pre-training for small models, which is essential for many
mobile devices. Current state-of-the-art methods on this problem transfer the …