Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Spartqa:: A textual question answering benchmark for spatial reasoning

R Mirzaee, HR Faghihi, Q Ning… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper proposes a question-answering (QA) benchmark for spatial reasoning on natural
language text which contains more realistic spatial phenomena not covered by prior work …

Vision-language navigation: a survey and taxonomy

W Wu, T Chang, X Li, Q Yin, Y Hu - Neural Computing and Applications, 2024 - Springer
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …

Grounding open-domain instructions to automate web support tasks

N Xu, S Masling, M Du, G Campagna, L Heck… - arxiv preprint arxiv …, 2021 - arxiv.org
Grounding natural language instructions on the web to perform previously unseen tasks
enables accessibility and automation. We introduce a task and dataset to train AI agents …

A meta-framework for spatiotemporal quantity extraction from text

Q Ning, B Zhou, H Wu, H Peng, C Fan… - Proceedings of the …, 2022 - aclanthology.org
News events are often associated with quantities (eg, the number of COVID-19 patients or
the number of arrests in a protest), and it is often important to extract their type, time, and …

Unifying structure reasoning and language model pre-training for complex reasoning

S Wang, Z Wei, J Xu, T Li, Z Fan - arxiv preprint arxiv:2301.08913, 2023 - arxiv.org
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills
have shown remarkable performance on downstream complex tasks. However, the …

Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks

S Wang, Z Wei, J Xu, T Li, Z Fan - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills
have shown remarkable performance on downstream complex tasks. However, the …

Into the Unknown: Generating Geospatial Descriptions for New Environments

T Paz-Argaman, J Palowitch, S Kulkarni… - arxiv preprint arxiv …, 2024 - arxiv.org
Similar to vision-and-language navigation (VLN) tasks that focus on bridging the gap
between vision and language for embodied navigation, the new Rendezvous (RVS) task …

tagE: Enabling an Embodied Agent to Understand Human Instructions

C Sarkar, A Mitra, P Pramanick, T Nayak - arxiv preprint arxiv:2310.15605, 2023 - arxiv.org
Natural language serves as the primary mode of communication when an intelligent agent
with a physical presence engages with human beings. While a plethora of research focuses …