Deep learning methods for semantic segmentation in remote sensing with small data: A survey

A Yu, Y Quan, R Yu, W Guo, X Wang, D Hong… - Remote Sensing, 2023 - mdpi.com
The annotations used during the training process are crucial for the inference results of
remote sensing images (RSIs) based on a deep learning framework. Unlabeled RSIs can be …

Fine-grained late-interaction multi-modal retrieval for retrieval augmented visual question answering

W Lin, J Chen, J Mei, A Coca… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to
utilize knowledge from external knowledge bases to answer visually-grounded questions …

[HTML][HTML] Poseidon: A data augmentation tool for small object detection datasets in maritime environments

P Ruiz-Ponce, D Ortiz-Perez, J Garcia-Rodriguez… - Sensors, 2023 - mdpi.com
Certain fields present significant challenges when attempting to train complex Deep
Learning architectures, particularly when the available datasets are limited and imbalanced …

Coati: Multimodal contrastive pretraining for representing and traversing chemical space

B Kaufman, EC Williams, C Underkoffler… - Journal of Chemical …, 2024 - ACS Publications
Creating a successful small molecule drug is a challenging multiparameter optimization
problem in an effectively infinite space of possible molecules. Generative models have …

Vision–language model for visual question answering in medical imagery

Y Bazi, MMA Rahhal, L Bashmal, M Zuair - Bioengineering, 2023 - mdpi.com
In the clinical and healthcare domains, medical images play a critical role. A mature medical
visual question answering system (VQA) can improve diagnosis by answering clinical …

[HTML][HTML] Vision Transformers for Image Classification: A Comparative Survey

Y Wang, Y Deng, Y Zheng, P Chattopadhyay, L Wang - Technologies, 2025 - mdpi.com
Transformers were initially introduced for natural language processing, leveraging the self-
attention mechanism. They require minimal inductive biases in their design and can function …

Machine-to-machine visual dialoguing with ChatGPT for enriched textual image description

R Ricci, Y Bazi, F Melgani - Remote Sensing, 2024 - mdpi.com
Image captioning is a technique that enables the automatic extraction of natural language
descriptions about the contents of an image. On the one hand, information in the form of …

Domain-specific chatbots for science using embeddings

KG Yager - Digital Discovery, 2023 - pubs.rsc.org
Large language models (LLMs) have emerged as powerful machine-learning systems
capable of handling a myriad of tasks. Tuned versions of these systems have been turned …

Summarization of videos with the signature transform

J de Curtò, I de Zarzà, G Roig, CT Calafate - Electronics, 2023 - mdpi.com
This manuscript presents a new benchmark for assessing the quality of visual summaries
without the need for human annotators. It is based on the Signature Transform, specifically …

A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing

X Zhang, W Li, X Wang, L Wang, F Zheng, L Wang… - Remote Sensing, 2023 - mdpi.com
In recent years, there has been a growing interest in remote sensing image–text cross-
modal retrieval due to the rapid development of space information technology and the …