CityNav: Language-Goal Aerial Navigation Dataset with Geographic Information

J Lee, T Miyanishi, S Kurita, K Sakamoto… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-and-language navigation (VLN) aims to guide autonomous agents through real-world
environments by integrating visual and linguistic cues. Despite notable advancements in …

Ovexp: Open vocabulary exploration for object-oriented navigation

M Wei, T Wang, Y Chen, H Wang, J Pang… - arxiv preprint arxiv …, 2024 - arxiv.org
Object-oriented embodied navigation aims to locate specific objects, defined by category or
depicted in images. Existing methods often struggle to generalize to open vocabulary goals …

Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments

L Barsellotti, R Bigazzi, M Cornia… - Advances in …, 2025 - proceedings.neurips.cc
In the last years, the research interest in visual navigation towards objects in indoor
environments has grown significantly. This growth can be attributed to the recent availability …

Nl-slam for oc-vln: Natural language grounded slam for object-centric vln

S Raychaudhuri, D Ta, K Ashton, AX Chang… - arxiv preprint arxiv …, 2024 - arxiv.org
Landmark-based navigation (eg go to the wooden desk) and relative positional navigation
(eg move 5 meters forward) are distinct navigation challenges solved very differently in …

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

G Zhou, Y Hong, Z Wang, C Zhao, M Bansal… - arxiv preprint arxiv …, 2024 - arxiv.org
The academic field of learning instruction-guided visual navigation can be generally
categorized into high-level category-specific search and low-level language-guided …

VLN-Game: Vision-Language Equilibrium Search for Zero-Shot Semantic Navigation

B Yu, Y Liu, L Han, H Kasaei, T Li, M Cao - arxiv preprint arxiv …, 2024 - arxiv.org
Following human instructions to explore and search for a specified target in an unfamiliar
environment is a crucial skill for mobile service robots. Most of the previous works on object …

Navigation with VLM framework: Go to Any Language

Z Yin, C Cheng - arxiv preprint arxiv:2410.02787, 2024 - arxiv.org
Navigating towards fully open language goals and exploring open scenes in a manner akin
to human exploration have always posed significant challenges. Recently, Vision Large …

EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents

Z Cheng, Y Tu, R Li, S Dai, J Hu, S Hu, J Li… - arxiv preprint arxiv …, 2025 - arxiv.org
Multimodal Large Language Models (MLLMs) have shown significant advancements,
providing a promising future for embodied agents. Existing benchmarks for evaluating …

SnapMem: Snapshot-based 3D Scene Memory for Embodied Exploration and Reasoning

Y Yang, H Yang, J Zhou, P Chen, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Constructing compact and informative 3D scene representations is essential for effective
embodied exploration and reasoning, especially in complex environments over long …

[PDF][PDF] The One RING: a Robotic Indoor Navigation Generalist

A Eftekhar, L Weihs, R Hendrix… - arxiv preprint arxiv …, 2024 - one-ring-policy.allen.ai
Modern robots vary significantly in shape, size, and sensor configurations used to perceive
and interact with their environments. However, most navigation policies are embodiment …