COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Vision-Language Models (VLMs) trained with contrastive loss have achieved significant
advancements in various vision and language tasks. However, the global nature of …
advancements in various vision and language tasks. However, the global nature of …