- Academic Search

A Rogers, M Gardner, I Augenstein - ACM Computing Surveys, 2023 - dl.acm.org

Alongside huge volumes of research on deep learning models in NLP in the recent years,
there has been much work on benchmark datasets needed to track modeling progress …

Save Cite Cited by 230 Related articles All 7 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scanrefer: 3d object localization in rgb-d scans using natural language

DZ Chen, AX Chang, M Nießner - European conference on computer …, 2020 - Springer

We introduce the task of 3D object localization in RGB-D scans using natural language
descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free …

Save Cite Cited by 355 Related articles All 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Remind your neural network to prevent catastrophic forgetting

TL Hayes, K Kafle, R Shrestha, M Acharya… - European conference on …, 2020 - Springer

People learn throughout life. However, incrementally updating conventional neural networks
leads to catastrophic forgetting. A common remedy is replay, which is inspired by how the …

Save Cite Cited by 367 Related articles All 11 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] frontiersin.org

Challenges and prospects in vision and language research

K Kafle, R Shrestha, C Kanan - Frontiers in Artificial Intelligence, 2019 - frontiersin.org

Language grounded image understanding tasks have often been proposed as a method for
evaluating progress in artificial intelligence. Ideally, these tasks should test a plethora of …

Save Cite Cited by 49 Related articles All 8 versions Free GPT-4 DeepSeek Cached

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Answer them all! toward universal visual question answering models

R Shrestha, K Kafle, C Kanan - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Abstract Visual Question Answering (VQA) research is split into two camps: the first focuses
on VQA datasets that require natural image understanding and the second focuses on …

Save Cite Cited by 107 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Rodeo: Replay for online object detection

M Acharya, TL Hayes, C Kanan - arxiv preprint arxiv:2008.06439, 2020 - arxiv.org

Humans can incrementally learn to do new visual detection tasks, which is a huge challenge
for today's computer vision systems. Incrementally trained deep learning models lack …

Save Cite Cited by 66 Related articles All 6 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

A negative case analysis of visual grounding methods for VQA

R Shrestha, K Kafle, C Kanan - … of the 58th annual meeting of the …, 2020 - aclanthology.org

Abstract Existing Visual Question Answering (VQA) methods tend to exploit dataset biases
and spurious statistical correlations, instead of producing right answers for the right reasons …

Save Cite Cited by 63 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Answering questions about data visualizations using efficient bimodal fusion

K Kafle, R Shrestha, S Cohen… - Proceedings of the …, 2020 - openaccess.thecvf.com

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task
where an algorithm must answer questions about data visualizations, eg bar charts, pie …

Save Cite Cited by 66 Related articles All 9 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Overcoming the stability gap in continual learning

MY Harun, C Kanan - arxiv preprint arxiv:2306.01904, 2023 - arxiv.org

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making
business decisions and to serve users; however, a major problem is model decay, where the …

Save Cite Cited by 11 Related articles All 4 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revisiting multi-modal llm evaluation

J Lu, S Srivastava, J Chen, R Shrestha… - arxiv preprint arxiv …, 2024 - arxiv.org

With the advent of multi-modal large language models (MLLMs), datasets used for visual
question answering (VQA) and referring expression comprehension have seen a …

Save Cite Cited by 4 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

VQD: Visual query detection in natural scenes

Qa dataset explosion: A taxonomy of nlp resources for question answering and reading comprehension

Scanrefer: 3d object localization in rgb-d scans using natural language

Remind your neural network to prevent catastrophic forgetting

Challenges and prospects in vision and language research

Answer them all! toward universal visual question answering models

Rodeo: Replay for online object detection

A negative case analysis of visual grounding methods for VQA

Answering questions about data visualizations using efficient bimodal fusion

Overcoming the stability gap in continual learning

Revisiting multi-modal llm evaluation