- Academic Search

Zapisz Cytuj Cytowane przez 321 Powiązane artykuły Wszystkie wersje 14 Wersja HTML

Just ask: Learning to answer questions from millions of narrated videos

A Yang, A Miech, J Sivic, I Laptev… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent methods for visual question answering rely on large-scale annotated datasets.
Manual annotation of questions and answers for videos, however, is tedious, expensive and …

Zapisz Cytuj Cytowane przez 5637 Powiązane artykuły Wszystkie wersje 16 Wersja HTML

Bottom-up and top-down attention for image captioning and visual question answering

P Anderson, X He, C Buehler… - Proceedings of the …, 2018 - openaccess.thecvf.com

Top-down visual attention mechanisms have been used extensively in image captioning
and visual question answering (VQA) to enable deeper image understanding through fine …

Zapisz Cytuj Cytowane przez 23 Powiązane artykuły Wszystkie wersje 2

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Zapisz Cytuj Cytowane przez 754 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Don't just assume; look and answer: Overcoming priors for visual question answering

A Agrawal, D Batra, D Parikh… - Proceedings of the …, 2018 - openaccess.thecvf.com

A number of studies have found that today's Visual Question Answering (VQA) models are
heavily driven by superficial correlations in the training data and lack sufficient image …

Zapisz Cytuj Cytowane przez 66 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

Pseudo-q: Generating pseudo language queries for visual grounding

H Jiang, Y Lin, D Han, S Song… - Proceedings of the …, 2022 - openaccess.thecvf.com

Visual grounding, ie, localizing objects in images according to natural language queries, is
an important topic in visual language understanding. The most effective approaches for this …

Zapisz Cytuj Cytowane przez 221 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

A zero-shot framework for sketch based image retrieval

SK Yelamarthi, SK Reddy… - Proceedings of the …, 2018 - openaccess.thecvf.com

Sketch-based image retrieval (SBIR) is the task of retrieving images from a natural image
database that correspond to a given hand-drawn sketch. Ideally, an SBIR model should …

Zapisz Cytuj Cytowane przez 68 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

All you may need for vqa are image captions

S Changpinyo, D Kukliansky, I Szpektor… - arxiv preprint arxiv …, 2022 - arxiv.org

Visual Question Answering (VQA) has benefited from increasingly sophisticated models, but
has not enjoyed the same level of engagement in terms of data creation. In this paper, we …

Zapisz Cytuj Cytowane przez 164 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

Mutant: A training paradigm for out-of-distribution generalization in visual question answering

T Gokhale, P Banerjee, C Baral, Y Yang - arxiv preprint arxiv:2009.08566, 2020 - arxiv.org

While progress has been made on the visual question answering leaderboards, models
often utilize spurious correlations and priors in datasets under the iid setting. As such …