- Academic Search

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …

Save Cite Cited by 33 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Navgpt-2: Unleashing navigational reasoning capability for large vision-language models

G Zhou, Y Hong, Z Wang, XE Wang, Q Wu - European Conference on …, 2024 - Springer

Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …

Save Cite Cited by 14 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] aclanthology.org

Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - Proceedings of the …, 2024 - aclanthology.org

Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …

Save Cite Cited by 14 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org

Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Save Cite Cited by 13 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Large multimodal agents: A survey

J **e, Z Chen, R Zhang, X Wan, G Li - arxiv preprint arxiv:2402.15116, 2024 - arxiv.org

Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

Save Cite Cited by 36 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Understanding World or Predicting Future? A Comprehensive Survey of World Models

J Ding, Y Zhang, Y Shang, Y Zhang, Z Zong… - arxiv preprint arxiv …, 2024 - arxiv.org

The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mapgpt: Map-guided prompting for unified vision-and-language navigation

J Chen, B Lin, R Xu, Z Chai, X Liang… - arxiv preprint arxiv …, 2024 - arxiv.org

Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking
and decision-making abilities across various tasks. However, existing zero-shot agents for …

Save Cite Cited by 19 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Sim-to-real transfer via 3d feature fields for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu, S Jiang - arxiv preprint arxiv:2406.09798, 2024 - arxiv.org

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in
3D environments following the natural language instruction. In this field, the agent is usually …

Save Cite Cited by 6 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

J Chen, B Lin, X Liu, L Ma, X Liang… - arxiv preprint arxiv …, 2024 - arxiv.org

LLM-based agents have demonstrated impressive zero-shot performance in vision-
language navigation (VLN) task. However, existing LLM-based methods often focus only on …

Save Cite Cited by 4 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Thinkbot: Embodied instruction following with thought chain reasoning

G Lu, Z Wang, C Liu, J Lu, Y Tang - arxiv preprint arxiv:2312.07062, 2023 - arxiv.org

Embodied Instruction Following (EIF) requires agents to complete human instruction by
interacting objects in complicated surrounding environments. Conventional methods directly …

Save Cite Cited by 6 Related articles All 2 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Discuss before moving: Visual language navigation via multi-expert discussions

Aligning cyber space with physical world: A comprehensive survey on embodied ai

Navgpt-2: Unleashing navigational reasoning capability for large vision-language models

Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Large multimodal agents: A survey

Understanding World or Predicting Future? A Comprehensive Survey of World Models

Mapgpt: Map-guided prompting for unified vision-and-language navigation

Sim-to-real transfer via 3d feature fields for vision-and-language navigation

Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation

Thinkbot: Embodied instruction following with thought chain reasoning