Remoteclip: A vision language foundation model for remote sensing

F Liu, D Chen, Z Guan, X Zhou, J Zhu… - … on Geoscience and …, 2024 - ieeexplore.ieee.org
General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …

Sam-clip: Merging vision foundation models towards semantic and spatial understanding

H Wang, PKA Vasu, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com
The landscape of publicly available vision foundation models (VFMs) such as CLIP and
SAM is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their …

AnyLoc: Towards Universal Visual Place Recognition

N Keetha, A Mishra, J Karhade… - IEEE Robotics and …, 2023 - ieeexplore.ieee.org
Visual Place Recognition (VPR) is vital for robot localization. To date, the most performant
VPR approaches are environment-and task-specific: while they exhibit strong performance …

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P **ao - European Conference on …, 2024 - Springer
The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

Scaling self-supervised learning for histopathology with masked image modeling

A Filiot, R Ghermi, A Olivier, P Jacob, L Fidon… - medRxiv, 2023 - medrxiv.org
Computational pathology is revolutionizing the field of pathology by integrating advanced
computer vision and machine learning technologies into diagnostic workflows. It offers …

CROMA: Remote sensing representations with contrastive radar-optical masked autoencoders

A Fuller, K Millard, J Green - Advances in Neural …, 2023 - proceedings.neurips.cc
A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled,
spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable …

Rotary position embedding for vision transformer

B Heo, S Park, D Han, S Yun - European Conference on Computer Vision, 2024 - Springer
Abstract Rotary Position Embedding (RoPE) performs remarkably on language models,
especially for length extrapolation of Transformers. However, the impacts of RoPE on …

Reverse engineering self-supervised learning

I Ben-Shaul, R Shwartz-Ziv, T Galanti… - Advances in …, 2023 - proceedings.neurips.cc
Understanding the learned representation and underlying mechanisms of Self-Supervised
Learning (SSL) often poses a challenge. In this paper, we 'reverse engineer'SSL, conducting …

SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting

H Kim, M Jang, W Yoon, J Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce a co-designed approach for human portrait relighting that combines a physics-
guided architecture with a pre-training framework. Drawing on the Cook-Torrance …

Know your self-supervised learning: a survey on image-based generative and discriminative training

U Ozbulak, HJ Lee, B Boga, ET Anzaku, H Park… - arxiv preprint arxiv …, 2023 - arxiv.org
Although supervised learning has been highly successful in improving the state-of-the-art in
the domain of image-based computer vision in the past, the margin of improvement has …