Remoteclip: A vision language foundation model for remote sensing
General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …
A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues
H Sharma, D Padha - Artificial Intelligence Review, 2023 - Springer
Image captioning is a pretty modern area of the convergence of computer vision and natural
language processing and is widely used in a range of applications such as multi-modal …
language processing and is widely used in a range of applications such as multi-modal …
Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval
Remote sensing (RS) cross-modal text-image retrieval has attracted extensive attention for
its advantages of flexible input and efficient query. However, traditional methods ignore the …
its advantages of flexible input and efficient query. However, traditional methods ignore the …
A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing
Q Cheng, Y Zhou, P Fu, Y Xu… - IEEE Journal of Selected …, 2021 - ieeexplore.ieee.org
Because of the rapid growth of multimodal data from the internet and social media, a cross-
modal retrieval has become an important and valuable task in recent years. The purpose of …
modal retrieval has become an important and valuable task in recent years. The purpose of …
Parameter-efficient transfer learning for remote sensing image-text retrieval
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …
recently. By fine-tuning them on specific datasets, significant performance improvements …
Language Integration in Remote Sensing: Tasks, datasets, and future directions
The emerging field of vision–language models, which combines computer vision and natural
language processing (NLP), has gained significant interest and exploration. This integration …
language processing (NLP), has gained significant interest and exploration. This integration …
A novel SVM-based decoder for remote sensing image captioning
G Hoxha, F Melgani - IEEE Transactions on Geoscience and …, 2021 - ieeexplore.ieee.org
Most of the remote sensing image captioning (IC) models are based on encoder–decoder
frameworks where a convolutional neural network (CNN) encodes the image information …
frameworks where a convolutional neural network (CNN) encodes the image information …
Change captioning: A new paradigm for multitemporal remote sensing image analysis
Change detection (CD) is among the most important applications in remote sensing (RS)
that allows identifying the changes that occurred in a given geographical area across …
that allows identifying the changes that occurred in a given geographical area across …
SD-RSIC: Summarization-driven deep remote sensing image captioning
Deep neural networks (DNNs) have been recently found popular for image captioning
problems in remote sensing (RS). Existing DNN-based approaches rely on the availability of …
problems in remote sensing (RS). Existing DNN-based approaches rely on the availability of …
Multilanguage transformer for improved text to remote sensing image retrieval
Cross-modal text-image retrieval in remote sensing (RS) provides a flexible retrieval
experience for mining useful information from RS repositories. However, existing methods …
experience for mining useful information from RS repositories. However, existing methods …