A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues
H Sharma, D Padha - Artificial Intelligence Review, 2023 - Springer
Image captioning is a pretty modern area of the convergence of computer vision and natural
language processing and is widely used in a range of applications such as multi-modal …
language processing and is widely used in a range of applications such as multi-modal …
Remote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset
Analyzing land cover changes with multitemporal remote sensing (RS) images is crucial for
environmental protection and land planning. In this article, we explore RS image change …
environmental protection and land planning. In this article, we explore RS image change …
High-resolution remote sensing image captioning based on structured attention
Automatically generating language descriptions of remote sensing images has become an
emerging research hot spot in the remote sensing field. Attention-based captioning, as a …
emerging research hot spot in the remote sensing field. Attention-based captioning, as a …
NWPU-captions dataset and MLCA-net for remote sensing image captioning
Q Cheng, H Huang, Y Xu, Y Zhou, H Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recently, the burgeoning demands for captioning-related applications have inspired great
endeavors in the remote sensing community. However, current benchmark datasets are …
endeavors in the remote sensing community. However, current benchmark datasets are …
A decoupling paradigm with prompt learning for remote sensing image change captioning
Remote sensing image change captioning (RSICC) is a novel task that aims to describe the
differences between bitemporal images by natural language. Previous methods ignore a …
differences between bitemporal images by natural language. Previous methods ignore a …
Bi-modal transformer-based approach for visual question answering in remote sensing imagery
Recently, vision-language models based on transformers are gaining popularity for joint
modeling of visual and textual modalities. In particular, they show impressive results when …
modeling of visual and textual modalities. In particular, they show impressive results when …
Language Integration in Remote Sensing: Tasks, datasets, and future directions
The emerging field of vision–language models, which combines computer vision and natural
language processing (NLP), has gained significant interest and exploration. This integration …
language processing (NLP), has gained significant interest and exploration. This integration …
Global visual feature and linguistic state guided attention for remote sensing image captioning
Z Zhang, W Zhang, M Yan, X Gao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The encoder–decoder framework is prevalent in existing remote-sensing image captioning
(RSIC) models. The appearance of attention mechanisms brings significant results …
(RSIC) models. The appearance of attention mechanisms brings significant results …
A novel SVM-based decoder for remote sensing image captioning
G Hoxha, F Melgani - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org
Most of the remote sensing image captioning (IC) models are based on encoder–decoder
frameworks where a convolutional neural network (CNN) encodes the image information …
frameworks where a convolutional neural network (CNN) encodes the image information …
Remote-sensing image captioning based on multilayer aggregated transformer
Remote-sensing image (RSI) captioning aims to automatically generate sentences
describing the content of RSIs. The multiscale information of RSIs contains attributes and …
describing the content of RSIs. The multiscale information of RSIs contains attributes and …