„Google“ mokslinčius

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org

For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Išsaugoti Cituoti Cituoja 146 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Išsaugoti Cituoti Cituoja 45 Susiję straipsniai Visos 5 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Open3dis: Open-vocabulary 3d instance segmentation with 2d mask guidance

P Nguyen, TD Ngo, E Kalogerakis… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce Open3DIS a novel solution designed to tackle the problem of Open-
Vocabulary Instance Segmentation within 3D scenes. Objects within 3D environments …

Išsaugoti Cituoti Cituoja 38 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

J Yang, R Ding, W Deng, Z Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

We propose a lightweight and scalable Regional Point-Language Contrastive learning
framework namely RegionPLC for open-world 3D scene understanding aiming to identify …

Išsaugoti Cituoti Cituoja 53 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Openins3d: Snap and lookup for 3d open-vocabulary instance segmentation

Z Huang, X Wu, X Chen, H Zhao, L Zhu… - European Conference on …, 2024 - Springer

In this work, we introduce OpenIns3D, a new 3D-input-only framework for 3D open-
vocabulary scene understanding. The OpenIns3D framework employs a “Mask-Snap …

Išsaugoti Cituoti Cituoja 34 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Grounded 3d-llm with referent tokens

Y Chen, S Yang, H Huang, T Wang, R Xu, R Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org

Prior studies on 3D scene understanding have primarily developed specialized models for
specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D …

Išsaugoti Cituoti Cituoja 16 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

V-IRL: Grounding Virtual Intelligence in Real Life

J Yang, R Ding, E Brown, X Qi, S **e - European Conference on Computer …, 2024 - Springer

There is a sensory gulf between the Earth that humans inhabit and the digital realms in
which modern AI agents are created. To develop AI agents that can sense, think, and act as …

Išsaugoti Cituoti Cituoja 14 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open-vocabulary 3d semantic segmentation with text-to-image diffusion models

X Zhu, H Zhou, P **ng, L Zhao, H Xu, J Liang… - … on Computer Vision, 2024 - Springer

In this paper, we investigate the use of diffusion models which are pre-trained on large-scale
image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel …

Išsaugoti Cituoti Cituoja 3 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can 3D Vision-Language Models Truly Understand Natural Language?

W Deng, J Yang, R Ding, J Liu, Y Li, X Qi… - arxiv preprint arxiv …, 2024 - arxiv.org

Rapid advancements in 3D vision-language (3D-VL) tasks have opened up new avenues
for human interaction with embodied agents or robots using natural language. Despite this …

Išsaugoti Cituoti Cituoja 4 Susiję straipsniai Visos 3 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

UniM-OV3D: uni-modality open-vocabulary 3D scene understanding with fine-grained feature representation

Q He, J Peng, Z Jiang, K Wu, X Ji, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

3D open-vocabulary scene understanding aims to recognize arbitrary novel categories
beyond the base label space. However, existing works not only fail to fully utilize all the …

Išsaugoti Cituoti Cituoja 5 Susiję straipsniai Visos 6 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Lowis3d: Language-driven open-world instance-level 3d scene understanding

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Shapellm: Universal 3d object understanding for embodied interaction

Open3dis: Open-vocabulary 3d instance segmentation with 2d mask guidance

Regionplc: Regional point-language contrastive learning for open-world 3d scene understanding

Openins3d: Snap and lookup for 3d open-vocabulary instance segmentation

Grounded 3d-llm with referent tokens

V-IRL: Grounding Virtual Intelligence in Real Life

Open-vocabulary 3d semantic segmentation with text-to-image diffusion models

Can 3D Vision-Language Models Truly Understand Natural Language?

UniM-OV3D: uni-modality open-vocabulary 3D scene understanding with fine-grained feature representation