A survey of efficient fine-tuning methods for vision-language models—prompt and adapter

J **ng, J Liu, J Wang, L Sun, X Chen, X Gu… - Computers & Graphics, 2024 - Elsevier
Abstract Vision Language Model (VLM) is a popular research field located at the fusion of
computer vision and natural language processing (NLP). With the emergence of transformer …

Vqa and visual reasoning: An overview of recent datasets, methods and challenges

RY Zakari, JW Owusu, H Wang, K Qin, ZK Lawal… - arxiv preprint arxiv …, 2022 - arxiv.org
Artificial Intelligence (AI) and its applications have sparked extraordinary interest in recent
years. This achievement can be ascribed in part to advances in AI subfields including …

Dual self-attention with co-attention networks for visual question answering

Y Liu, X Zhang, Q Zhang, C Li, F Huang, X Tang, Z Li - Pattern Recognition, 2021 - Elsevier
Abstract Visual Question Answering (VQA) as an important task in understanding vision and
language has been proposed and aroused wide interests. In previous VQA methods …

A survey of methods, datasets and evaluation metrics for visual question answering

H Sharma, AS Jalal - Image and Vision Computing, 2021 - Elsevier
Abstract Visual Question Answering (VQA) is a multi-disciplinary research problem that has
captured the attention of both computer vision as well as natural language processing …

An improved attention and hybrid optimization technique for visual question answering

H Sharma, AS Jalal - Neural Processing Letters, 2022 - Springer
Abstract In Visual Question Answering (VQA), an attention mechanism has a critical role in
specifying the different objects present in an image or tells the machine where to focus by …

Image captioning improved visual question answering

H Sharma, AS Jalal - Multimedia tools and applications, 2022 - Springer
Abstract Both Visual Question Answering (VQA) and image captioning are the problems
which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In …

Positional attention guided transformer-like architecture for visual question answering

A Mao, Z Yang, K Lin, J Xuan… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Transformer architectures have recently been introduced into the field of visual question
answering (VQA), due to their powerful capabilities of information extraction and fusion …

Mix-tower: Light visual question answering framework based on exclusive self-attention mechanism

D Chen, J Chen, L Yang, F Shang - Neurocomputing, 2024 - Elsevier
Visual question answering (VQA) holds the potential to enhance artificial intelligence
proficiency in understanding natural language, stimulate advances in computer vision …

Innovating sustainability: VQA-based AI for carbon neutrality challenges

Y Chen, Q Li, JY Liu - Journal of Organizational and End User …, 2024 - igi-global.com
In today's global society, carbon neutrality has become a focal point of concern. Greenhouse
gas emissions and rising atmospheric temperatures are triggering various extreme weather …

A question-guided multi-hop reasoning graph network for visual question answering

Z Xu, J Gu, M Liu, G Zhou, H Fu, C Qiu - Information Processing & …, 2023 - Elsevier
Abstract Visual Question Answering (VQA) requires reasoning about the visually-grounded
relations in the image and question context. A crucial aspect of solving complex questions is …