- Academic Search

J **ng, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier

Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …

Lưu Trích dẫn Trích dẫn 21 bài viết Bài viết có liên quan Tất cả 2 phiên bản

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Vqa and visual reasoning: An overview of recent datasets, methods and challenges

RY Zakari, JW Owusu, H Wang, K Qin, ZK Lawal… - arxiv preprint arxiv …, 2022 - arxiv.org

Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent
years. This achievement can be ascribed in part to advances in AI subfields including …

Lưu Trích dẫn Trích dẫn 17 bài viết Bài viết có liên quan Tất cả 4 phiên bản Xem dạng HTML

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier

Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

Lưu Trích dẫn Trích dẫn 63 bài viết Bài viết có liên quan Tất cả 2 phiên bản

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier

Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

Lưu Trích dẫn Trích dẫn 48 bài viết Bài viết có liên quan Tất cả 2 phiên bản

An improved attention and hybrid optimization technique for visual question answering

H Sharma, AS Jalal - Neural Processing Letters, 2022 - Springer

Abstract In Visual Question Answering (VQA), an attention mechanism has a critical role in
specifying the different objects present in an image or tells the machine where to focus by …

Lưu Trích dẫn Trích dẫn 38 bài viết Bài viết có liên quan Tất cả 3 phiên bản

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer

Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …

Lưu Trích dẫn Trích dẫn 40 bài viết Bài viết có liên quan Tất cả 4 phiên bản

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

Lưu Trích dẫn Trích dẫn 22 bài viết Bài viết có liên quan Tất cả 2 phiên bản

Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism

D Chen, J Chen, L Yang, F Shang - Neurocomputing, 2024 - Elsevier

Visual question answering (VQA) holds the potential to enhance artificial intelligence
proficiency in understanding natural language, stimulate advances in computer vision …

Lưu Trích dẫn Trích dẫn 6 bài viết Bài viết có liên quan Tất cả 2 phiên bản

[免费ChatGPT] [DeepSeek可用网址] [PDF] igi-global.com

Innovating sustainability: VQA-based AI for carbon neutrality challenges

Y Chen, Q Li, JY Liu - Journal of Organizational and End User …, 2024 - igi-global.com

In today's global society, carbon neutrality has become a focal point of concern. Greenhouse
gas emissions and rising atmospheric temperatures are triggering various extreme weather …

Lưu Trích dẫn Trích dẫn 17 bài viết Bài viết có liên quan Tất cả 4 phiên bản

A question-guided multi-hop reasoning graph network for visual question answering

Z Xu, J Gu, M Liu, G Zhou, H Fu, C Qiu - Information Processing & …, 2023 - Elsevier

Abstract Visual Question Answering (VQA) requires reasoning about the visually-grounded
relations in the image and question context. A crucial aspect of solving complex questions is …

Lưu Trích dẫn Trích dẫn 13 bài viết Bài viết có liên quan Tất cả 2 phiên bản

Tạo thông báo

Trích dẫn

Tìm kiếm nâng cao

Đã lưu vào Thư viện của tôi

ALSA: adversarial learning of supervised attentions for visual question answering

A survey of efficient fine-tuning methods for vision-language models—prompt and adapter