Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org
Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arxiv preprint arxiv …, 2020 - arxiv.org
We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

X Wang, Q Huang, A Celikyilmaz… - Proceedings of the …, 2019 - openaccess.thecvf.com
Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out
natural language instructions inside real 3D environments. In this paper, we study how to …

Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org
Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arxiv preprint arxiv …, 2022 - arxiv.org
A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Rubi: Reducing unimodal biases for visual question answering

R Cadene, C Dancette, M Cord… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …