Teach: Task-driven embodied agents that chat
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …
both understanding and executing instructions, and using conversation to resolve ambiguity …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Spartqa:: A textual question answering benchmark for spatial reasoning
This paper proposes a question-answering (QA) benchmark for spatial reasoning on natural
language text which contains more realistic spatial phenomena not covered by prior work …
language text which contains more realistic spatial phenomena not covered by prior work …
Vision-language navigation: a survey and taxonomy
Vision-language navigation (VLN) tasks require an agent to follow language instructions
from a human guide to navigate in previously unseen environments using visual …
from a human guide to navigate in previously unseen environments using visual …
Grounding open-domain instructions to automate web support tasks
Grounding natural language instructions on the web to perform previously unseen tasks
enables accessibility and automation. We introduce a task and dataset to train AI agents …
enables accessibility and automation. We introduce a task and dataset to train AI agents …
A meta-framework for spatiotemporal quantity extraction from text
News events are often associated with quantities (eg, the number of COVID-19 patients or
the number of arrests in a protest), and it is often important to extract their type, time, and …
the number of arrests in a protest), and it is often important to extract their type, time, and …
Unifying structure reasoning and language model pre-training for complex reasoning
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills
have shown remarkable performance on downstream complex tasks. However, the …
have shown remarkable performance on downstream complex tasks. However, the …
Unifying Structure Reasoning and Language Pre-Training for Complex Reasoning Tasks
Recent pre-trained language models (PLMs) equipped with foundation reasoning skills
have shown remarkable performance on downstream complex tasks. However, the …
have shown remarkable performance on downstream complex tasks. However, the …
Into the Unknown: Generating Geospatial Descriptions for New Environments
Similar to vision-and-language navigation (VLN) tasks that focus on bridging the gap
between vision and language for embodied navigation, the new Rendezvous (RVS) task …
between vision and language for embodied navigation, the new Rendezvous (RVS) task …
tagE: Enabling an Embodied Agent to Understand Human Instructions
Natural language serves as the primary mode of communication when an intelligent agent
with a physical presence engages with human beings. While a plethora of research focuses …
with a physical presence engages with human beings. While a plethora of research focuses …