Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …
there has been much work on benchmark datasets needed to track modeling progress …
Experience grounds language
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …
physical world it describes and to the social interactions it facilitates. Despite the incredible …
Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …
Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out
natural language instructions inside real 3D environments. In this paper, we study how to …
natural language instructions inside real 3D environments. In this paper, we study how to …
Teach: Task-driven embodied agents that chat
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …
both understanding and executing instructions, and using conversation to resolve ambiguity …
Core challenges in embodied vision-language planning
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …
have led to the development of challenging tasks at the intersection of Computer Vision …
Vision-and-language navigation: A survey of tasks, methods, and future directions
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
humans in natural language, perceive the environment, and perform real-world tasks. Vision …
Rubi: Reducing unimodal biases for visual question answering
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …
image. Some VQA models often exploit unimodal biases to provide the correct answer …