- Academic Search

Y Liu, W Chen, Y Bai, X Liang, G Li, W Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …

Lagre Referanse Sitert av 42 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

G Zhou, Y Hong, Q Wu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT
and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such …

Lagre Referanse Sitert av 133 Beslektede artikler Alle 6 versjoner HTML-versjon

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Lagre Referanse Sitert av 7 Beslektede artikler Alle 3 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How much can clip benefit vision-and-language tasks?

S Shen, LH Li, H Tan, M Bansal, A Rohrbach… - arxiv preprint arxiv …, 2021 - arxiv.org

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …

Lagre Referanse Sitert av 460 Beslektede artikler Alle 3 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

Lagre Referanse Sitert av 64 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

History aware multimodal transformer for vision-and-language navigation

S Chen, PL Guhur, C Schmid… - Advances in neural …, 2021 - proceedings.neurips.cc

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …

Lagre Referanse Sitert av 240 Beslektede artikler Alle 10 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Volumetric environment representation for vision-language navigation

R Liu, W Wang, Y Yang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Vision-language navigation (VLN) requires an agent to navigate through an 3D environment
based on visual observations and natural language instructions. It is clear that the pivotal …

Lagre Referanse Sitert av 26 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Think global, act local: Dual-scale graph transformer for vision-and-language navigation

S Chen, PL Guhur, M Tapaswi… - Proceedings of the …, 2022 - openaccess.thecvf.com

Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …

Lagre Referanse Sitert av 157 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bird's-eye-view scene graph for vision-language navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Lagre Referanse Sitert av 50 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Gridmm: Grid memory map for vision-and-language navigation

Z Wang, X Li, J Yang, Y Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location
following the natural language instruction in 3D environments. To represent the previously …

Lagre Referanse Sitert av 50 Beslektede artikler Alle 5 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

Language and visual entity relationship graph for agent navigation

Aligning cyber space with physical world: A comprehensive survey on embodied ai

Navgpt: Explicit reasoning in vision-and-language navigation with large language models

Embodied navigation with multi-modal information: A survey from tasks to methodology

How much can clip benefit vision-and-language tasks?

Scaling data generation in vision-and-language navigation

History aware multimodal transformer for vision-and-language navigation

Volumetric environment representation for vision-language navigation

Think global, act local: Dual-scale graph transformer for vision-and-language navigation

Bird's-eye-view scene graph for vision-language navigation

Gridmm: Grid memory map for vision-and-language navigation