- Academic Search

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

保存引用被引用数: 159 関連記事全 7 バージョン

[Free GPT-4]

[PDF] arxiv.org

Unifying 3d vision-language understanding via promptable queries

Z Zhu, Z Zhang, X Ma, X Niu, Y Chen, B Jia… - … on Computer Vision, 2024 - Springer

A unified model for 3D vision-language (3D-VL) understanding is expected to take various
scene representations and perform a wide range of tasks in a 3D scene. However, a …

保存引用被引用数: 14 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

View selection for 3d captioning via diffusion ranking

T Luo, J Johnson, H Lee - European Conference on Computer Vision, 2024 - Springer

Scalable annotation approaches are crucial for constructing extensive 3D-text datasets,
facilitating a broader range of applications. However, existing methods sometimes lead to …

保存引用被引用数: 10 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

An embodied generalist agent in 3d world

J Huang, S Yong, X Ma, X Linghu, P Li, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Leveraging massive knowledge and learning schemes from large language models (LLMs),
recent machine learning models show notable successes in building generalist agents that …

保存引用被引用数: 89 関連記事全 4 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Tod3cap: Towards 3d dense captioning in outdoor scenes

B **, Y Zheng, P Li, W Li, Y Zheng, S Hu, X Liu… - … on Computer Vision, 2024 - Springer

Abstract 3D dense captioning stands as a cornerstone in achieving a comprehensive
understanding of 3D scenes through natural language. It has recently witnessed remarkable …

保存引用被引用数: 8 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Motionchain: Conversational motion controllers via multimodal prompts

B Jiang, X Chen, C Zhang, F Yin, Z Li, G Yu… - European Conference on …, 2024 - Springer

Recent advancements in language models have demonstrated their adeptness in
conducting multi-turn dialogues and retaining conversational context. However, this …

保存引用被引用数: 7 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors

Y Tang, X Han, X Li, Q Yu, Y Hao, L Hu… - Proceedings of the 32nd …, 2024 - dl.acm.org

Large 2D vision-language models (2D-LLMs) have gained significant attention by bridging
Large Language Models (LLMs) with images using a simple projector. Inspired by their …

保存引用被引用数: 13 関連記事全 2 バージョン

[Free GPT-4]

[PDF] arxiv.org

A survey of label-efficient deep learning for 3D point clouds

A **ao, X Zhang, L Shao, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

In the past decade, deep neural networks have achieved significant progress in point cloud
learning. However, collecting large-scale precisely-annotated point clouds is extremely …

保存引用被引用数: 18 関連記事全 4 バージョン

[Free GPT-4]

[PDF] openreview.net

Chat-scene: Bridging 3d scene and large language models with object identifiers

H Huang, Y Chen, Z Wang, R Huang, R Xu… - The Thirty-eighth …, 2024 - openreview.net

Recent advancements in 3D Large Language Models (LLMs) have demonstrated promising
capabilities for 3D scene understanding. However, previous methods exhibit deficiencies in …

保存引用被引用数: 12 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Scanreason: Empowering 3d visual grounding with reasoning capabilities

C Zhu, T Wang, W Zhang, K Chen, X Liu - European Conference on …, 2024 - Springer

Although great progress has been made in 3D visual grounding, current models still rely on
explicit textual descriptions for grounding and lack the ability to reason human intentions …

保存引用被引用数: 5 関連記事全 6 バージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

A Survey of Multimodel Large Language Models

Unifying 3d vision-language understanding via promptable queries

View selection for 3d captioning via diffusion ranking

An embodied generalist agent in 3d world

Tod3cap: Towards 3d dense captioning in outdoor scenes

Motionchain: Conversational motion controllers via multimodal prompts

Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors

A survey of label-efficient deep learning for 3D point clouds

Chat-scene: Bridging 3d scene and large language models with object identifiers

Scanreason: Empowering 3d visual grounding with reasoning capabilities