Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org
Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Scanrefer: 3d object localization in rgb-d scans using natural language

DZ Chen, AX Chang, M Nießner - European conference on computer …, 2020 - Springer
We introduce the task of 3D object localization in RGB-D scans using natural language
descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free …

Remind your neural network to prevent catastrophic forgetting

TL Hayes, K Kafle, R Shrestha, M Acharya… - European conference on …, 2020 - Springer
People learn throughout life. However, incrementally updating conventional neural networks
leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the …

Challenges and prospects in vision and language research

K Kafle, R Shrestha, C Kanan - Frontiers in Artificial Intelligence, 2019 - frontiersin.org
Language grounded image understanding tasks have often been proposed as a method for
evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of …

Answer them all! toward universal visual question answering models

R Shrestha, K Kafle, C Kanan - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Abstract Visual Question Answering (VQA) research is split into two camps: the first focuses
on VQA datasets that require natural image understanding and the second focuses on …

Rodeo: Replay for online object detection

M Acharya, TL Hayes, C Kanan - arxiv preprint arxiv:2008.06439, 2020 - arxiv.org
Humans can incrementally learn to do new visual detection tasks, which is a huge challenge
for today's computer vision systems. Incrementally trained deep learning models lack …

A negative case analysis of visual grounding methods for VQA

R Shrestha, K Kafle, C Kanan - … of the 58th annual meeting of the …, 2020 - aclanthology.org
Abstract Existing Visual Question Answering (VQA) methods tend to exploit dataset biases
and spurious statistical correlations, instead of producing right answers for the right reasons …

Answering questions about data visualizations using efficient bimodal fusion

K Kafle, R Shrestha, S Cohen… - Proceedings of the …, 2020 - openaccess.thecvf.com
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task
where an algorithm must answer questions about data visualizations, eg bar charts, pie …

Overcoming the stability gap in continual learning

MY Harun, C Kanan - arxiv preprint arxiv:2306.01904, 2023 - arxiv.org
Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making
business decisions and to serve users; however, a major problem is model decay, where the …

Revisiting multi-modal llm evaluation

J Lu, S Srivastava, J Chen, R Shrestha… - arxiv preprint arxiv …, 2024 - arxiv.org
With the advent of multi-modal large language models (MLLMs), datasets used for visual
question answering (VQA) and referring expression comprehension have seen a …