Google Академик

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Сачувај Цитирај 49 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Сачувај Цитирај 27 пута наведен Сродни чланци Све верзије (3)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A-okvqa: A benchmark for visual question answering using world knowledge

D Schwenk, A Khandelwal, C Clark, K Marino… - European conference on …, 2022 - Springer

Abstract The Visual Question Answering (VQA) task aspires to provide a meaningful testbed
for the development of AI models that can jointly reason over visual and natural language …

Сачувај Цитирај 441 пута наведен Сродни чланци Све верзије (6)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Wiki-llava: Hierarchical retrieval-augmented generation for multimodal llms

D Caffagni, F Cocchi, N Moratelli… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal LLMs are the natural evolution of LLMs and enlarge their capabilities so as to
work beyond the pure textual modality. As research is being carried out to design novel …

Сачувај Цитирај 29 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Transform-retrieve-generate: Natural language-centric outside-knowledge visual question answering

F Gao, Q **, G Thattai, A Reganti… - Proceedings of the …, 2022 - openaccess.thecvf.com

Outside-knowledge visual question answering (OK-VQA) requires the agent to comprehend
the image, make use of relevant knowledge from the entire web, and digest all the …

Сачувај Цитирај 104 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can pre-trained vision and language models answer visual information-seeking questions?

Y Chen, H Hu, Y Luan, H Sun, S Changpinyo… - arxiv preprint arxiv …, 2023 - arxiv.org

Pre-trained vision and language models have demonstrated state-of-the-art capabilities over
existing tasks involving images and texts, including visual question answering. However, it …

Сачувај Цитирај 72 пута наведен Сродни чланци Све верзије (7) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Encyclopedic vqa: Visual questions about detailed properties of fine-grained categories

T Mensink, J Uijlings, L Castrejon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset
featuring visual questions about detailed properties of fine-grained categories and …

Сачувај Цитирај 31 пута наведен Сродни чланци Све верзије (9) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering

Y Li, L Wang, B Hu, X Chen, W Zhong, C Lyu… - arxiv preprint arxiv …, 2023 - arxiv.org

The emergence of multimodal large models (MLMs) has significantly advanced the field of
visual understanding, offering remarkable capabilities in the realm of visual question …

Сачувај Цитирај 37 пута наведен Сродни чланци Све верзије (3) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Weakly-supervised visual-retriever-reader for knowledge-based question answering

M Luo, Y Zeng, P Banerjee, C Baral - arxiv preprint arxiv:2109.04014, 2021 - arxiv.org

Knowledge-based visual question answering (VQA) requires answering questions with
external knowledge in addition to the content of images. One dataset that is mostly used in …

Сачувај Цитирај 72 пута наведен Сродни чланци Све верзије (11) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Lako: Knowledge-driven visual question answering via late knowledge-to-text injection

Z Chen, Y Huang, J Chen, Y Geng, Y Fang… - Proceedings of the 11th …, 2022 - dl.acm.org

Visual question answering (VQA) often requires an understanding of visual concepts and
language semantics, which relies on external knowledge. Most existing methods exploit pre …

Сачувај Цитирај 36 пута наведен Сродни чланци Све верзије (8)

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Select, substitute, search: A new benchmark for knowledge-augmented visual question answering

Knowledge graphs meet multi-modal learning: A comprehensive survey

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

A-okvqa: A benchmark for visual question answering using world knowledge

Wiki-llava: Hierarchical retrieval-augmented generation for multimodal llms

Transform-retrieve-generate: Natural language-centric outside-knowledge visual question answering

Can pre-trained vision and language models answer visual information-seeking questions?

Encyclopedic vqa: Visual questions about detailed properties of fine-grained categories

A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering

Weakly-supervised visual-retriever-reader for knowledge-based question answering

Lako: Knowledge-driven visual question answering via late knowledge-to-text injection