Llm-planner: Few-shot grounded planning for embodied agents with large language models

CH Song, J Wu, C Washington… - Proceedings of the …, 2023 - openaccess.thecvf.com
This study focuses on using large language models (LLMs) as a planner for embodied
agents that can follow natural language instructions to complete complex tasks in a visually …

Esc: Exploration with soft commonsense constraints for zero-shot object navigation

K Zhou, K Zheng, C Pryor, Y Shen… - International …, 2023 - proceedings.mlr.press
The ability to accurately locate and navigate to a specific object is a crucial capability for
embodied agents that operate in the real world and interact with objects to complete tasks …

Vlmbench: A compositional benchmark for vision-and-language manipulation

K Zheng, X Chen, OC Jenkins… - Advances in Neural …, 2022 - proceedings.neurips.cc
Benefiting from language flexibility and compositionality, humans naturally intend to use
language to command an embodied agent for complex tasks such as navigation and object …

Vision-and-language navigation today and tomorrow: A survey in the era of foundation models

Y Zhang, Z Ma, J Li, Y Qiao, Z Wang, J Chai… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-and-Language Navigation (VLN) has gained increasing attention over recent years
and many approaches have emerged to advance their development. The remarkable …

Advancements and challenges in mobile robot navigation: A comprehensive review of algorithms and potential for self-learning approaches

S Al Mahmud, A Kamarulariffin, AM Ibrahim… - Journal of Intelligent & …, 2024 - Springer
Mobile robot navigation has been a very popular topic of practice among researchers since
a while. With the goal of enhancing the autonomy in mobile robot navigation, numerous …

Scene-llm: Extending language model for 3d visual understanding and reasoning

R Fu, J Liu, X Chen, Y Nie, W **ong - arxiv preprint arxiv:2403.11401, 2024 - arxiv.org
This paper introduces Scene-LLM, a 3D-visual-language model that enhances embodied
agents' abilities in interactive 3D indoor environments by integrating the reasoning strengths …

Open-ended instructable embodied agents with memory-augmented large language models

G Sarch, Y Wu, MJ Tarr, K Fragkiadaki - arxiv preprint arxiv:2310.15127, 2023 - arxiv.org
Pre-trained and frozen LLMs can effectively map simple scene re-arrangement instructions
to programs over a robot's visuomotor functions through appropriate few-shot example …

Plan, posture and go: Towards open-world text-to-motion generation

J Liu, W Dai, C Wang, Y Cheng, Y Tang… - arxiv preprint arxiv …, 2023 - arxiv.org
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-world scenarios. Some works use the CLIP …

Plan, Posture and Go: Towards Open-Vocabulary Text-to-Motion Generation

J Liu, W Dai, C Wang, Y Cheng, Y Tang… - European Conference on …, 2024 - Springer
Conventional text-to-motion generation methods are usually trained on limited text-motion
pairs, making them hard to generalize to open-vocabulary scenarios. Some works use the …

To Boost Zero-Shot Generalization for Embodied Reasoning With Vision-Language Pre-Training

K Su, X Zhang, S Zhang, J Zhu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Recently, there exists an increased research interest in embodied artificial intelligence (EAI),
which involves an agent learning to perform a specific task when dynamically interacting …