Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Cleanclip: Mitigating data poisoning attacks in multimodal contrastive learning
Multimodal contrastive pretraining has been used to train multimodal representation models,
such as CLIP, on large amounts of paired image-text data. However, previous studies have …
such as CLIP, on large amounts of paired image-text data. However, previous studies have …
Spurious correlations in machine learning: A survey
Machine learning systems are known to be sensitive to spurious correlations between non-
essential features of the inputs (eg, background, texture, and secondary objects) and the …
essential features of the inputs (eg, background, texture, and secondary objects) and the …
Robust learning with progressive data expansion against spurious correlation
While deep learning models have shown remarkable performance in various tasks, they are
susceptible to learning non-generalizable _spurious features_ rather than the core features …
susceptible to learning non-generalizable _spurious features_ rather than the core features …
Distilling vision-language models on millions of videos
The recent advance in vision-language models is largely attributed to the abundance of
image-text data. We aim to replicate this success for video-language models but there …
image-text data. We aim to replicate this success for video-language models but there …
Sieve: Multimodal dataset pruning using image captioning models
Abstract Vision-Language Models (VLMs) are pretrained on large diverse and noisy web-
crawled datasets. This underscores the critical need for dataset pruning as the quality of …
crawled datasets. This underscores the critical need for dataset pruning as the quality of …
Calibrating multi-modal representations: A pursuit of group robustness without annotations
Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse
downstream tasks. However, several pain points persist for this paradigm:(i) directly tuning …
downstream tasks. However, several pain points persist for this paradigm:(i) directly tuning …
A Sober Look at the Robustness of CLIPs to Spurious Features
Large vision language models, such as CLIP, demonstrate impressive robustness to
spurious features than single-modal models trained on ImageNet. However, existing test …
spurious features than single-modal models trained on ImageNet. However, existing test …
Fd-align: Feature discrimination alignment for fine-tuning pre-trained models in few-shot learning
Due to the limited availability of data, existing few-shot learning methods trained from
scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models …
scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models …
Prompting is a double-edged sword: improving worst-group robustness of foundation models
Machine learning models fail catastrophically under distribution shift, but a surprisingly
effective way to empirically improve robustness to some types of shift (* eg*, Imagenet-A/C) …
effective way to empirically improve robustness to some types of shift (* eg*, Imagenet-A/C) …
Zero-shot robustification of zero-shot models
Zero-shot inference is a powerful paradigm that enables the use of large pretrained models
for downstream classification tasks without further training. However, these models are …
for downstream classification tasks without further training. However, these models are …