From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

Visual attention methods in deep learning: An in-depth survey

M Hassanin, S Anwar, I Radwan, FS Khan, A Mian - Information Fusion, 2024 - Elsevier
Inspired by the human cognitive system, attention is a mechanism that imitates the human
cognitive awareness about specific information, amplifying critical details to focus more on …

An overview of image caption generation methods

H Wang, Y Zhang, X Yu - Computational intelligence and …, 2020 - Wiley Online Library
In recent years, with the rapid development of artificial intelligence, image caption has
gradually attracted the attention of many researchers in the field of artificial intelligence and …

Capsal: Leveraging captioning to boost semantics for salient object detection

L Zhang, J Zhang, Z Lin, H Lu… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Detecting salient objects in cluttered scenes is a big challenge. To address this problem, we
argue that the model needs to learn discriminative semantic features for salient objects. To …

[HTML][HTML] Human gaze assisted artificial intelligence: A review

R Zhang, A Saran, B Liu, Y Zhu, S Guo… - IJCAI: Proceedings of …, 2020 - ncbi.nlm.nih.gov
Human gaze reveals a wealth of information about internal cognitive state. Thus, gaze-
related research has significantly increased in computer vision, natural language …

Model-agnostic gender debiased image captioning

Y Hirota, Y Nakashima… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning models are known to perpetuate and amplify harmful societal bias in the
training set. In this work, we aim to mitigate such gender bias in image captioning models …

Paying more attention to saliency: Image captioning with saliency and context attention

M Cornia, L Baraldi, G Serra, R Cucchiara - ACM Transactions on …, 2018 - dl.acm.org
Image captioning has been recently gaining a lot of attention thanks to the impressive
achievements shown by deep captioning architectures, which combine Convolutional …

Predicting human scanpaths in visual question answering

X Chen, M Jiang, Q Zhao - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Attention has been an important mechanism for both humans and computer vision systems.
While state-of-the-art models to predict attention focus on estimating a static probabilistic …

Gazexplain: Learning to predict natural language explanations of visual scanpaths

X Chen, M Jiang, Q Zhao - European Conference on Computer Vision, 2024 - Springer
While exploring visual scenes, humans' scanpaths are driven by their underlying attention
processes. Understanding visual scanpaths is essential for various applications. Traditional …