[HTML][HTML] Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model

Y Zhan, Z **ong, Y Yuan - ISPRS Journal of Photogrammetry and Remote …, 2025 - Elsevier
Large language models (LLMs) have recently been extended to the vision-language realm,
obtaining impressive general multi-modal capabilities. However, the exploration of multi …

When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system

H Zhang, JJ Xu, HW Cui, L Li, Y Yang… - … and Remote Sensing …, 2024 - ieeexplore.ieee.org
Artificial intelligence (AI) has significantly advanced Earth sciences, yet its full potential in to
comprehensively modeling Earth's complex dynamics remains unrealized. Geoscience …

Multi-modal LLMs in agriculture: A comprehensive review

R Sapkota, R Qureshi, SZ Hassan, J Shutske… - Authorea …, 2024 - techrxiv.org
Given the rapid emergence and applications of Large Language Models (LLMs) across
various scientific fields, insights regarding their applicability in agriculture are still only …

Foundation Model-based Spectral-Spatial Transformer for Hyperspectral Image Classification

L Huang, Y Chen, X He - IEEE Transactions on Geoscience …, 2024 - ieeexplore.ieee.org
Recently, deep learning models have dominated hyperspectral image (HSI) classification.
Nowadays, deep learning is undergoing a paradigm shift with the rise of transformer-based …

Rs-moe: Mixture of experts for remote sensing image captioning and visual question answering

H Lin, D Hong, S Ge, C Luo, K Jiang, H **… - arxiv preprint arxiv …, 2024 - arxiv.org
Remote Sensing Image Captioning (RSIC) presents unique challenges and plays a critical
role in applications. Traditional RSIC methods often struggle to produce rich and diverse …

Leveraging visual language model and generative diffusion model for zero-shot sar target recognition

J Wang, H Sun, T Tang, Y Sun, Q He, L Lin… - Remote …, 2024 - search.proquest.com
Simulated data play an important role in SAR target recognition, particularly under zero-shot
learning (ZSL) conditions caused by the lack of training samples. The traditional SAR …

Urbancross: Enhancing satellite image-text retrieval with cross-domain adaptation

S Zhong, X Hao, Y Yan, Y Zhang, Y Song… - Proceedings of the 32nd …, 2024 - dl.acm.org
Urbanization challenges underscore the necessity for effective satellite image-text retrieval
methods to swiftly access specific information enriched with geographic semantics for urban …

Chatearthnet: A global-scale, high-quality image-text dataset for remote sensing

Z Yuan, Z **ong, L Mou, XX Zhu - arxiv preprint arxiv:2402.11325, 2024 - arxiv.org
An in-depth comprehension of global land cover is essential in Earth observation, forming
the foundation for a multitude of applications. Although remote sensing technology has …

From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing

X Sun, B Peng, C Zhang, F **, Q Niu, J Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Remote sensing has evolved from simple image acquisition to complex systems capable of
integrating and processing visual and textual data. This review examines the development …

HCNet: Hierarchical Feature Aggregation and Cross-Modal Feature Alignment for Remote Sensing Image Captioning

Z Yang, Q Li, Y Yuan, Q Wang - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Remote sensing image captioning (RSIC) aims to describe the crucial objects from remote
sensing images in the form of natural language. The inefficient utilization of object texture …