Google Académico

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - ar** multi-task embodied agents. We've …

Guardar Citar Citado por 75 Artículos relacionados Las 3 versiones Versión en HTML

3d-vista: Pre-trained transformer for 3d vision and text alignment

Z Zhu, X Ma, Y Chen, Z Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …

Guardar Citar Citado por 100 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Guardar Citar Citado por 40 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

B Jia, Y Chen, H Yu, Y Wang, X Niu, T Liu, Q Li… - … on Computer Vision, 2024 - Springer

Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …

Guardar Citar Citado por 42 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] thecvf.com

Openeqa: Embodied question answering in the era of foundation models

A Majumdar, A Ajay, X Zhang, P Putta… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …

Guardar Citar Citado por 91 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]

[PDF] aaai.org

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

T Qian, J Chen, L Zhuo, Y Jiao, YG Jiang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

We introduce a novel visual question answering (VQA) task in the context of autonomous
driving, aiming to answer natural language questions based on street-view clues. Compared …

Guardar Citar Citado por 104 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] thecvf.com

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Guardar Citar Citado por 61 Artículos relacionados Las 3 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Sqa3d: Situated question answering in 3d scenes

Aligning cyber space with physical world: A comprehensive survey on embodied ai

3d-vista: Pre-trained transformer for 3d vision and text alignment

Shapellm: Universal 3d object understanding for embodied interaction

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

Openeqa: Embodied question answering in the era of foundation models

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning