Llm-planner: Few-shot grounded planning for embodied agents with large language models
This study focuses on using large language models (LLMs) as a planner for embodied
agents that can follow natural language instructions to complete complex tasks in a visually …
agents that can follow natural language instructions to complete complex tasks in a visually …
Esc: Exploration with soft commonsense constraints for zero-shot object navigation
The ability to accurately locate and navigate to a specific object is a crucial capability for
embodied agents that operate in the real world and interact with objects to complete tasks …
embodied agents that operate in the real world and interact with objects to complete tasks …
Vlmbench: A compositional benchmark for vision-and-language manipulation
Benefiting from language flexibility and compositionality, humans naturally intend to use
language to command an embodied agent for complex tasks such as navigation and object …
language to command an embodied agent for complex tasks such as navigation and object …
Vision-and-language navigation today and tomorrow: A survey in the era of foundation models
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …
and many approaches have emerged to advance their development. The remarkable …
Advancements and challenges in mobile robot navigation: A comprehensive review of algorithms and potential for self-learning approaches
Mobile robot navigation has been a very popular topic of practice among researchers since
a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous …
a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous …
Scene-llm: Extending language model for 3d visual understanding and reasoning
This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied
agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths …
agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths …
Open-ended instructable embodied agents with memory-augmented large language models
Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions
to programs over a robot's visuomotor functions through appropriate few-shot example …
to programs over a robot's visuomotor functions through appropriate few-shot example …
Plan, posture and go: Towards open-world text-to-motion generation
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP …
pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP …
Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …
To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training
Recently, there exists an increased research interest in embodied artificial intelligence (EAI),
which involves an agent learning to perform a specific task when dynamically interacting …
which involves an agent learning to perform a specific task when dynamically interacting …