On the opportunities and challenges of foundation models for geospatial artificial intelligence

G Mai, W Huang, J Sun, S Song, D Mishra… - arxiv preprint arxiv …, 2023 - arxiv.org
Large pre-trained models, also known as foundation models (FMs), are trained in a task-
agnostic manner on large-scale data and can be adapted to a wide range of downstream …

[HTML][HTML] A survey on dataset quality in machine learning

Y Gong, G Liu, Y Xue, R Li, L Meng - Information and Software Technology, 2023 - Elsevier
With the rise of big data, the quality of datasets has become a crucial factor affecting the
performance of machine learning models. High-quality datasets are essential for the …

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Mmbench: Is your multi-modal model an all-around player?

Y Liu, H Duan, Y Zhang, B Li, S Zhang, W Zhao… - European conference on …, 2024 - Springer
Large vision-language models (VLMs) have recently achieved remarkable progress,
exhibiting impressive multimodal perception and reasoning abilities. However, effectively …

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Diffir: Efficient diffusion model for image restoration

B **a, Y Zhang, S Wang, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis
process into a sequential application of a denoising network. However, different from image …

Unified-io 2: Scaling autoregressive multimodal models with vision language audio and action

J Lu, C Clark, S Lee, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present Unified-IO 2 a multimodal and multi-skill unified model capable of following
novel instructions. Unified-IO 2 can use text images audio and/or videos as input and can …

Quilt-1m: One million image-text pairs for histopathology

W Ikezogwo, S Seyfioglu, F Ghezloo… - Advances in neural …, 2023 - proceedings.neurips.cc
Recent accelerations in multi-modal applications have been made possible with the
plethora of image and text data available online. However, the scarcity of analogous data in …

Out-of-distribution detection with deep nearest neighbors

Y Sun, Y Ming, X Zhu, Y Li - International Conference on …, 2022 - proceedings.mlr.press
Abstract Out-of-distribution (OOD) detection is a critical task for deploying machine learning
models in the open world. Distance-based methods have demonstrated promise, where …