- Academic Search

CH Song, J Wu, C Washington… - Proceedings of the …, 2023 - openaccess.thecvf.com

This study focuses on using large language models (LLMs) as a planner for embodied
agents that can follow natural language instructions to complete complex tasks in a visually …

Save Cite Cited by 486 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press

Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Save Cite Cited by 434 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Habitat 2.0: Training home assistants to rearrange their habitat

A Szot, A Clegg, E Undersander… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract We introduce Habitat 2.0 (H2. 0), a simulation platform for training virtual robots in
interactive 3D environments and complex physics-enabled scenarios. We make …

Save Cite Cited by 529 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

How much can clip benefit vision-and-language tasks?

S Shen, LH Li, H Tan, M Bansal, A Rohrbach… - arxiv preprint arxiv …, 2021 - arxiv.org

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using
a relatively small set of manually-annotated data (as compared to web-crawled data), to …

Save Cite Cited by 457 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Language to rewards for robotic skill synthesis

W Yu, N Gileadi, C Fu, S Kirmani, KH Lee… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated exciting progress in acquiring diverse
new capabilities through in-context learning, ranging from logical reasoning to code-writing …

Save Cite Cited by 253 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai

SK Ramakrishnan, A Gokaslan, E Wijmans… - arxiv preprint arxiv …, 2021 - arxiv.org

We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of
1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each …

Save Cite Cited by 371 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

A survey on multimodal large language models for autonomous driving

C Cui, Y Ma, X Cao, W Ye, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the emergence of Large Language Models (LLMs) and Vision Foundation Models
(VFMs), multimodal AI systems benefiting from large models have the potential to equally …

Save Cite Cited by 267 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

History aware multimodal transformer for vision-and-language navigation

S Chen, PL Guhur, C Schmid… - Advances in neural …, 2021 - proceedings.neurips.cc

Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …

Save Cite Cited by 237 Related articles All 8 versions Free GPT-4 View as HTML

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier

Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Save Cite Cited by 5 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] nowpublishers.com

Conversational information seeking

H Zamani, JR Trippas, J Dalton… - … and Trends® in …, 2023 - nowpublishers.com

Conversational information seeking (CIS) is concerned with a sequence of interactions
between one or more users and an information system. Interactions in CIS are primarily …

Save Cite Cited by 159 Related articles All 8 versions Free GPT-4 Library Search View as HTML

Create alert

Cite

Advanced search

Saved to My library

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

Llm-planner: Few-shot grounded planning for embodied agents with large language models

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

Habitat 2.0: Training home assistants to rearrange their habitat

How much can clip benefit vision-and-language tasks?

Language to rewards for robotic skill synthesis

Habitat-matterport 3d dataset (hm3d): 1000 large-scale 3d environments for embodied ai

A survey on multimodal large language models for autonomous driving

History aware multimodal transformer for vision-and-language navigation

Embodied navigation with multi-modal information: A survey from tasks to methodology

Conversational information seeking