Sam-clip: Merging vision foundation models towards semantic and spatial understanding

H Wang, PKA Vasu, F Faghri… - Proceedings of the …, 2024 - openaccess.thecvf.com
The landscape of publicly available vision foundation models (VFMs) such as CLIP and
SAM is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their …

Llm2clip: Powerful language model unlock richer visual representation

W Huang, A Wu, Y Yang, X Luo, Y Yang, L Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
CLIP is one of the most important multimodal foundational models today. What powers
CLIP's capabilities? The rich supervision signals provided by natural language, the carrier of …

Advancing Multi-Modal Sensing Through Expandable Modality Alignment

S Dai, S Jiang, Y Yang, T Cao, M Li, S Banerjee… - arxiv preprint arxiv …, 2024 - arxiv.org
Sensing technology is widely used for comprehending the physical world, with numerous
modalities explored in past decades. While there has been considerable work on multi …

[HTML][HTML] SeFi-CD: A Semantic First Change Detection Paradigm That Can Detect Any Change You Want

L Zhao, Z Huang, Y Wang, C Peng, J Gan, H Li, C Hu - Remote Sensing, 2024 - mdpi.com
The existing change detection (CD) methods can be summarized as the visual-first change
detection (ViFi-CD) paradigm, which first extracts change features from visual differences …

LLM2CLIP: Extending the Capability Boundaries of CLIP through Large Language Models

A Wu, Y Yang, X Luo, Y Yang, L Hu, Q Dai, X Dai… - openreview.net
CLIP is one of the most important multimodal foundational models today, aligning visual and
textual signals into a shared feature space using a simple contrastive learning loss on large …

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

A Wu, Y Yang, X Luo, Y Yang, C Wang, L Hu… - NeurIPS 2024 Workshop … - openreview.net
CLIP is one of the most important foundational multimodal models today. It aligns image and
text modalities into a shared feature space by leveraging a simple contrastive learning loss …