A survey of efficient fine-tuning methods for vision-language models—prompt and adapter
J **ng, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier
Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …
computer vision and natural language processing (NLP). With the emergence of transformer …
Vqa and visual reasoning: An overview of recent datasets, methods and challenges
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent
years. This achievement can be ascribed in part to advances in AI subfields including …
years. This achievement can be ascribed in part to advances in AI subfields including …
Dual self-attention with co-attention networks for visual question answering
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …
language has been proposed and aroused wide interests. In previous VQA methods …
A survey of methods, datasets and evaluation metrics for visual question answering
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …
captured the attention of both computer vision as well as natural language processing …
An improved attention and hybrid optimization technique for visual question answering
Abstract In Visual Question Answering (VQA), an attention mechanism has a critical role in
specifying the different objects present in an image or tells the machine where to focus by …
specifying the different objects present in an image or tells the machine where to focus by …
Image captioning improved visual question answering
Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …
Positional attention guided transformer-like architecture for visual question answering
A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …
answering (VQA), due to their powerful capabilities of information extraction and fusion …
Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism
D Chen, J Chen, L Yang, F Shang - Neurocomputing, 2024 - Elsevier
Visual question answering (VQA) holds the potential to enhance artificial intelligence
proficiency in understanding natural language, stimulate advances in computer vision …
proficiency in understanding natural language, stimulate advances in computer vision …
Innovating sustainability: VQA-based AI for carbon neutrality challenges
Y Chen, Q Li, JY Liu - Journal of Organizational and End User …, 2024 - igi-global.com
In today's global society, carbon neutrality has become a focal point of concern. Greenhouse
gas emissions and rising atmospheric temperatures are triggering various extreme weather …
gas emissions and rising atmospheric temperatures are triggering various extreme weather …
A question-guided multi-hop reasoning graph network for visual question answering
Abstract Visual Question Answering (VQA) requires reasoning about the visually-grounded
relations in the image and question context. A crucial aspect of solving complex questions is …
relations in the image and question context. A crucial aspect of solving complex questions is …