- Academic Search

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - ar** multi-task embodied agents. We've …

Uložit Citovat Počet citací tohoto článku: 75 Související články Všechny verze (počet: 3) Zobrazit jako HTML

Ll3da: Visual interactive instruction tuning for omni-3d understanding reasoning and planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Uložit Citovat Počet citací tohoto článku: 65 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

T Qian, J Chen, L Zhuo, Y Jiao, YG Jiang - Proceedings of the AAAI …, 2024 - ojs.aaai.org

We introduce a novel visual question answering (VQA) task in the context of autonomous
driving, aiming to answer natural language questions based on street-view clues. Compared …

Uložit Citovat Počet citací tohoto článku: 104 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Uložit Citovat Počet citací tohoto článku: 45 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An embodied generalist agent in 3d world

J Huang, S Yong, X Ma, X Linghu, P Li, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

Leveraging massive knowledge from large language models (LLMs), recent machine
learning models show notable successes in general-purpose task solving in diverse …

Uložit Citovat Počet citací tohoto článku: 97 Související články Všechny verze (počet: 7) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Sqa3d: Situated question answering in 3d scenes

Aligning cyber space with physical world: A comprehensive survey on embodied ai

Ll3da: Visual interactive instruction tuning for omni-3d understanding reasoning and planning

Nuscenes-qa: A multi-modal visual question answering benchmark for autonomous driving scenario

Shapellm: Universal 3d object understanding for embodied interaction

An embodied generalist agent in 3d world