Deep learning methods for semantic segmentation in remote sensing with small data: A survey
A Yu, Y Quan, R Yu, W Guo, X Wang, D Hong… - Remote Sensing, 2023 - mdpi.com
The annotations used during the training process are crucial for the inference results of
remote sensing images (RSIs) based on a deep learning framework. Unlabeled RSIs can be …
remote sensing images (RSIs) based on a deep learning framework. Unlabeled RSIs can be …
Fine-grained late-interaction multi-modal retrieval for retrieval augmented visual question answering
Abstract Knowledge-based Visual Question Answering (KB-VQA) requires VQA systems to
utilize knowledge from external knowledge bases to answer visually-grounded questions …
utilize knowledge from external knowledge bases to answer visually-grounded questions …
[HTML][HTML] Poseidon: A data augmentation tool for small object detection datasets in maritime environments
Certain fields present significant challenges when attempting to train complex Deep
Learning architectures, particularly when the available datasets are limited and imbalanced …
Learning architectures, particularly when the available datasets are limited and imbalanced …
Coati: Multimodal contrastive pretraining for representing and traversing chemical space
B Kaufman, EC Williams, C Underkoffler… - Journal of Chemical …, 2024 - ACS Publications
Creating a successful small molecule drug is a challenging multiparameter optimization
problem in an effectively infinite space of possible molecules. Generative models have …
problem in an effectively infinite space of possible molecules. Generative models have …
Vision–language model for visual question answering in medical imagery
In the clinical and healthcare domains, medical images play a critical role. A mature medical
visual question answering system (VQA) can improve diagnosis by answering clinical …
visual question answering system (VQA) can improve diagnosis by answering clinical …
[HTML][HTML] Vision Transformers for Image Classification: A Comparative Survey
Y Wang, Y Deng, Y Zheng, P Chattopadhyay, L Wang - Technologies, 2025 - mdpi.com
Transformers were initially introduced for natural language processing, leveraging the self-
attention mechanism. They require minimal inductive biases in their design and can function …
attention mechanism. They require minimal inductive biases in their design and can function …
Machine-to-machine visual dialoguing with ChatGPT for enriched textual image description
Image captioning is a technique that enables the automatic extraction of natural language
descriptions about the contents of an image. On the one hand, information in the form of …
descriptions about the contents of an image. On the one hand, information in the form of …
Domain-specific chatbots for science using embeddings
KG Yager - Digital Discovery, 2023 - pubs.rsc.org
Large language models (LLMs) have emerged as powerful machine-learning systems
capable of handling a myriad of tasks. Tuned versions of these systems have been turned …
capable of handling a myriad of tasks. Tuned versions of these systems have been turned …
Summarization of videos with the signature transform
This manuscript presents a new benchmark for assessing the quality of visual summaries
without the need for human annotators. It is based on the Signature Transform, specifically …
without the need for human annotators. It is based on the Signature Transform, specifically …
A Fusion Encoder with Multi-Task Guidance for Cross-Modal Text–Image Retrieval in Remote Sensing
X Zhang, W Li, X Wang, L Wang, F Zheng, L Wang… - Remote Sensing, 2023 - mdpi.com
In recent years, there has been a growing interest in remote sensing image–text cross-
modal retrieval due to the rapid development of space information technology and the …
modal retrieval due to the rapid development of space information technology and the …