A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues

H Sharma, D Padha - Artificial Intelligence Review, 2023 - Springer
Image captioning is a pretty modern area of the convergence of computer vision and natural
language processing and is widely used in a range of applications such as multi-modal …

Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset

C Liu, R Zhao, H Chen, Z Zou… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Analyzing land cover changes with multitemporal remote sensing (RS) images is crucial for
environmental protection and land planning. In this article, we explore RS image change …

High-resolution remote sensing image captioning based on structured attention

R Zhao, Z Shi, Z Zou - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org
Automatically generating language descriptions of remote sensing images has become an
emerging research hot spot in the remote sensing field. Attention-based captioning, as a …

NWPU-captions dataset and MLCA-net for remote sensing image captioning

Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …

A decoupling paradigm with prompt learning for remote sensing image change captioning

C Liu, R Zhao, J Chen, Z Qi, Z Zou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Remote sensing image change captioning (RSICC) is a novel task that aims to describe the
differences between bitemporal images by natural language. Previous methods ignore a …

Bi-modal transformer-based approach for visual question answering in remote sensing imagery

Y Bazi, MM Al Rahhal, ML Mekhalfi… - … on Geoscience and …, 2022 - ieeexplore.ieee.org
Recently, vision-language models based on transformers are gaining popularity for joint
modeling of visual and textual modalities. In particular, they show impressive results when …

Language Integration in Remote Sensing: Tasks, datasets, and future directions

L Bashmal, Y Bazi, F Melgani… - … and Remote Sensing …, 2023 - ieeexplore.ieee.org
The emerging field of vision–language models, which combines computer vision and natural
language processing (NLP), has gained significant interest and exploration. This integration …

Global visual feature and linguistic state guided attention for remote sensing image captioning

Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …

A novel SVM-based decoder for remote sensing image captioning

G Hoxha, F Melgani - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org
Most of the remote sensing image captioning (IC) models are based on encoder–decoder
frameworks where a convolutional neural network (CNN) encodes the image information …

Remote-sensing image captioning based on multilayer aggregated transformer

C Liu, R Zhao, Z Shi - IEEE Geoscience and Remote Sensing …, 2022 - ieeexplore.ieee.org
Remote-sensing image (RSI) captioning aims to automatically generate sentences
describing the content of RSIs. The multiscale information of RSIs contains attributes and …