- Academic Search

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Enregistrer Citer Cité 228 fois Autres articles Les 6 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Invariant grounding for video question answering

Y Li, X Wang, J ** from natural language instructions and egocentric …

Enregistrer Citer Cité 816 fois Autres articles Les 11 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aclanthology.org

Experience grounds language

Y Bisk, A Holtzman, J Thomason, J Andreas… - arxiv preprint arxiv …, 2020 - arxiv.org

Language understanding research is held back by a failure to relate language to the
physical world it describes and to the social interactions it facilitates. Despite the incredible …

Enregistrer Citer Cité 421 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

A Ku, P Anderson, R Patel, E Ie, J Baldridge - arxiv preprint arxiv …, 2020 - arxiv.org

We introduce Room-Across-Room (RxR), a new Vision-and-Language Navigation (VLN)
dataset. RxR is multilingual (English, Hindi, and Telugu) and larger (more paths and …

Enregistrer Citer Cité 303 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

X Wang, Q Huang, A Celikyilmaz… - Proceedings of the …, 2019 - openaccess.thecvf.com

Vision-language navigation (VLN) is the task of navigating an embodied agent to carry out
natural language instructions inside real 3D environments. In this paper, we study how to …

Enregistrer Citer Cité 615 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] aaai.org

Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org

Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Enregistrer Citer Cité 176 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] jair.org Full View

Core challenges in embodied vision-language planning

J Francis, N Kitamura, F Labelle, X Lu, I Navarro… - Journal of Artificial …, 2022 - jair.org

Recent advances in the areas of multimodal machine learning and artificial intelligence (AI)
have led to the development of challenging tasks at the intersection of Computer Vision …

Enregistrer Citer Cité 51 fois Autres articles Les 14 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Vision-and-language navigation: A survey of tasks, methods, and future directions

J Gu, E Stefani, Q Wu, J Thomason… - arxiv preprint arxiv …, 2022 - arxiv.org

A long-term goal of AI research is to build intelligent agents that can communicate with
humans in natural language, perceive the environment, and perform real-world tasks. Vision …

Enregistrer Citer Cité 130 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Rubi: Reducing unimodal biases for visual question answering

R Cadene, C Dancette, M Cord… - Advances in neural …, 2019 - proceedings.neurips.cc

Abstract Visual Question Answering (VQA) is the task of answering questions about an
image. Some VQA models often exploit unimodal biases to provide the correct answer …

Enregistrer Citer Cité 456 fois Autres articles Les 11 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Shifting the baseline: Single modality performance on visual navigation & qa

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

Invariant grounding for video question answering

Experience grounds language

Room-across-room: Multilingual vision-and-language navigation with dense spatiotemporal grounding

Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation

Teach: Task-driven embodied agents that chat

Core challenges in embodied vision-language planning

Vision-and-language navigation: A survey of tasks, methods, and future directions

Rubi: Reducing unimodal biases for visual question answering