Google Académico

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Guardar Citar Citado por 5 Artículos relacionados Las 3 versiones

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Guardar Citar Citado por 127 Artículos relacionados Las 4 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vln bert: A recurrent vision-and-language bert for navigation

Y Hong, Q Wu, Y Qi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Accuracy of many visiolinguistic tasks has benefited significantly from the application of
vision-and-language (V&L) BERT. However, its application for the task of vision-and …

Guardar Citar Citado por 292 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Episodic transformer for vision-and-language navigation

A Pashevich, C Schmid, C Sun - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Interaction and navigation defined by natural language instructions in dynamic
environments pose significant challenges for neural agents. This paper focuses on …

Guardar Citar Citado por 209 Artículos relacionados Las 10 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arxiv preprint arxiv …, 2020 - arxiv.org

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

Guardar Citar Citado por 306 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org

Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Guardar Citar Citado por 177 Artículos relacionados Las 10 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] jair.org Full View

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Guardar Citar Citado por 52 Artículos relacionados Las 14 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arxiv preprint arxiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Guardar Citar Citado por 131 Artículos relacionados Las 6 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Velma: Verbalization embodiment of llm agents for vision and language navigation in street view

R Schumann, W Zhu, W Feng, TJ Fu… - Proceedings of the …, 2024 - ojs.aaai.org

Incremental decision making in real-world environments is one of the most challenging tasks
in embodied artificial intelligence. One particularly demanding scenario is Vision and …

Guardar Citar Citado por 54 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Envedit: Environment editing for vision-and-language navigation

J Li, H Tan, M Bansal - … of the IEEE/CVF Conference on …, 2022 - openaccess.thecvf.com

Abstract In Vision-and-Language Navigation (VLN), an agent needs to navigate through the
environment based on natural language instructions. Due to limited available data for agent …

Guardar Citar Citado por 88 Artículos relacionados Las 5 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Babywalk: Going farther in vision-and-language navigation by taking baby steps

Embodied navigation with multi-modal information: A survey from tasks to methodology

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

Vln bert: A recurrent vision-and-language bert for navigation

Episodic transformer for vision-and-language navigation

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

Teach: Task-driven embodied agents that chat

Core challenges in embodied vision-language planning

Vision-and-language navigation: A survey of tasks, methods, and future directions

Velma: Verbalization embodiment of llm agents for vision and language navigation in street view

Envedit: Environment editing for vision-and-language navigation