- Academic Search

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis

Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Zapisz Cytuj Cytowane przez 40 Powiązane artykuły Wszystkie wersje 2

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Zapisz Cytuj Cytowane przez 5 Powiązane artykuły Wszystkie wersje 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shapellm: Universal 3d object understanding for embodied interaction

Z Qi, R Dong, S Zhang, H Geng, C Han, Z Ge… - … on Computer Vision, 2024 - Springer

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …

Zapisz Cytuj Cytowane przez 41 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

Zapisz Cytuj Cytowane przez 60 Powiązane artykuły Wszystkie wersje 7 Wyszukiwanie bibliotek Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ok-robot: What really matters in integrating open-knowledge models for robotics

P Liu, Y Orru, J Vakil, C Paxton, NMM Shafiullah… - arxiv preprint arxiv …, 2024 - arxiv.org

Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …

Zapisz Cytuj Cytowane przez 63 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

J Wen, Y Zhu, J Li, M Zhu, K Wu, Z Xu, N Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-Language-Action (VLA) models have shown remarkable potential in visuomotor
control and instruction comprehension through end-to-end learning processes. However …

Zapisz Cytuj Cytowane przez 16 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Zapisz Cytuj Cytowane przez 14 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Goat-bench: A benchmark for multi-modal lifelong navigation

M Khanna, R Ramrakhya… - Proceedings of the …, 2024 - openaccess.thecvf.com

The Embodied AI community has recently made significant strides in visual navigation tasks
exploring targets from 3D coordinates objects language description and images. However …

Zapisz Cytuj Cytowane przez 16 Powiązane artykuły Wszystkie wersje 3 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adaptive mobile manipulation for articulated objects in the open world

H **ong, R Mendonca, K Shaw, D Pathak - arxiv preprint arxiv …, 2024 - arxiv.org

Deploying robots in open-ended unstructured environments such as homes has been a long-
standing research problem. However, robots are often studied only in closed-off lab settings …

Zapisz Cytuj Cytowane przez 39 Powiązane artykuły Wszystkie wersje 2 Wersja HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Poliformer: Scaling on-policy rl with transformers results in masterful navigators

KH Zeng, Z Zhang, K Ehsani, R Hendrix… - arxiv preprint arxiv …, 2024 - arxiv.org

We present PoliFormer (Policy Transformer), an RGB-only indoor navigation agent trained
end-to-end with reinforcement learning at scale that generalizes to the real-world without …

Zapisz Cytuj Cytowane przez 5 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Goat: Go to any thing

Real-world robot applications of foundation models: A review

Embodied navigation with multi-modal information: A survey from tasks to methodology

Shapellm: Universal 3d object understanding for embodied interaction

Multi3drefer: Grounding text description to multiple 3d objects

Ok-robot: What really matters in integrating open-knowledge models for robotics

Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Goat-bench: A benchmark for multi-modal lifelong navigation

Adaptive mobile manipulation for articulated objects in the open world

Poliformer: Scaling on-policy rl with transformers results in masterful navigators