Normalized and geometry-aware self-attention network for image captioning

L Guo, J Liu, X Zhu, P Yao, S Lu… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Self-attention (SA) network has shown profound value in image captioning. In this paper, we
improve SA from two aspects to promote the performance of image captioning. First, we …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …

Adaptive path selection for dynamic image captioning

T **an, Z Li, Z Tang, H Ma - … on Circuits and Systems for Video …, 2022 - ieeexplore.ieee.org
Image captioning is a challenging task, ie, given an image machine automatically generates
natural language that matches its semantic content and has attracted much attention in …

Vision-enhanced and consensus-aware transformer for image captioning

S Cao, G An, Z Zheng, Z Wang - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Image captioning generates descriptions in a natural language for a given image. Due to its
great potential for a wide range of applications, many deep learning based-methods have …

Joint embedding of deep visual and semantic features for medical image report generation

Y Yang, J Yu, J Zhang, W Han, H Jiang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Medical image report generation (MeIRG) aims at generating associated diagnosis
descriptions with natural language sentences from medical images, which is essential in the …

Prompt-based learning for unpaired image captioning

P Zhu, X Wang, L Zhu, Z Sun, WS Zheng… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from
unaligned vision-language sample pairs. Existing works usually tackle this task using …

Visual cluster grounding for image captioning

W Jiang, M Zhu, Y Fang, G Shi… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Attention mechanisms have been extensively adopted in vision and language tasks such as
image captioning. It encourages a captioning model to dynamically ground appropriate …

Image difference captioning with instance-level fine-grained feature representation

Q Huang, Y Liang, J Wei, Y Cai, H Liang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
The task of image difference captioning aims at locating changed objects in similar image
pairs and describing the difference with natural language. The key challenges of this task …

Dual attention on pyramid feature maps for image captioning

L Yu, J Zhang, Q Wu - IEEE Transactions on Multimedia, 2021 - ieeexplore.ieee.org
Generating natural sentences from images is a fundamental learning task for visual-
semantic understanding in multimedia. In this paper, we propose to apply dual attention on …

Deep reinforcement polishing network for video captioning

W Xu, J Yu, Z Miao, L Wan, Y Tian… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The video captioning task aims to describe video content using several natural-language
sentences. Although one-step encoder-decoder models have achieved promising progress …