A Survey of Multimodel Large Language Models
Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …
including vision, the technology of large language models is evolving from a single modality …
Unifying 3d vision-language understanding via promptable queries
A unified model for 3D vision-language (3D-VL) understanding is expected to take various
scene representations and perform a wide range of tasks in a 3D scene. However, a …
scene representations and perform a wide range of tasks in a 3D scene. However, a …
View selection for 3d captioning via diffusion ranking
Scalable annotation approaches are crucial for constructing extensive 3D-text datasets,
facilitating a broader range of applications. However, existing methods sometimes lead to …
facilitating a broader range of applications. However, existing methods sometimes lead to …
An embodied generalist agent in 3d world
Leveraging massive knowledge and learning schemes from large language models (LLMs),
recent machine learning models show notable successes in building generalist agents that …
recent machine learning models show notable successes in building generalist agents that …
Tod3cap: Towards 3d dense captioning in outdoor scenes
Abstract 3D dense captioning stands as a cornerstone in achieving a comprehensive
understanding of 3D scenes through natural language. It has recently witnessed remarkable …
understanding of 3D scenes through natural language. It has recently witnessed remarkable …
Motionchain: Conversational motion controllers via multimodal prompts
Recent advancements in language models have demonstrated their adeptness in
conducting multi-turn dialogues and retaining conversational context. However, this …
conducting multi-turn dialogues and retaining conversational context. However, this …
Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors
Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging
Large Language Models (LLMs) with images using a simple projector. Inspired by their …
Large Language Models (LLMs) with images using a simple projector. Inspired by their …
A survey of label-efficient deep learning for 3D point clouds
In the past decade, deep neural networks have achieved significant progress in point cloud
learning. However, collecting large-scale precisely-annotated point clouds is extremely …
learning. However, collecting large-scale precisely-annotated point clouds is extremely …
Chat-scene: Bridging 3d scene and large language models with object identifiers
Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …
Scanreason: Empowering 3d visual grounding with reasoning capabilities
Although great progress has been made in 3D visual grounding, current models still rely on
explicit textual descriptions for grounding and lack the ability to reason human intentions …
explicit textual descriptions for grounding and lack the ability to reason human intentions …