Aligning cyber space with physical world: A comprehensive survey on embodied ai
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Navgpt-2: Unleashing navigational reasoning capability for large vision-language models
Capitalizing on the remarkable advancements in Large Language Models (LLMs), there is a
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …
burgeoning initiative to harness LLMs for instruction following robotic navigation. Such a …
Mapgpt: Map-guided prompting with adaptive path planning for vision-and-language navigation
Embodied agents equipped with GPT as their brain have exhibited extraordinary decision-
making and generalization abilities across various tasks. However, existing zero-shot agents …
making and generalization abilities across various tasks. However, existing zero-shot agents …
Vision-and-language navigation today and tomorrow: A survey in the era of foundation models
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …
and many approaches have emerged to advance their development. The remarkable …
Large multimodal agents: A survey
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …
based AI agents, endowing them with decision-making and reasoning abilities akin to …
Understanding World or Predicting Future? A Comprehensive Survey of World Models
The concept of world models has garnered significant attention due to advancements in
multimodal large language models such as GPT-4 and video generation models such as …
multimodal large language models such as GPT-4 and video generation models such as …
Mapgpt: Map-guided prompting for unified vision-and-language navigation
Embodied agents equipped with GPT as their brain have exhibited extraordinary thinking
and decision-making abilities across various tasks. However, existing zero-shot agents for …
and decision-making abilities across various tasks. However, existing zero-shot agents for …
Sim-to-real transfer via 3d feature fields for vision-and-language navigation
Vision-and-language navigation (VLN) enables the agent to navigate to a remote location in
3D environments following the natural language instruction. In this field, the agent is usually …
3D environments following the natural language instruction. In this field, the agent is usually …
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation
LLM-based agents have demonstrated impressive zero-shot performance in vision-
language navigation (VLN) task. However, existing LLM-based methods often focus only on …
language navigation (VLN) task. However, existing LLM-based methods often focus only on …
Thinkbot: Embodied instruction following with thought chain reasoning
Embodied Instruction Following (EIF) requires agents to complete human instruction by
interacting objects in complicated surrounding environments. Conventional methods directly …
interacting objects in complicated surrounding environments. Conventional methods directly …