Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Vlp: A survey on vision-language pre-training
In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …
such as computer vision (CV) and natural language processing (NLP) to a new era …
Dynamic modality interaction modeling for image-text retrieval
Image-text retrieval is a fundamental and crucial branch in information retrieval. Although
much progress has been made in bridging vision and language, it remains challenging …
much progress has been made in bridging vision and language, it remains challenging …
Kaleido-bert: Vision-language pre-training on fashion domain
We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which
introduces a novel kaleido strategy for fashion cross-modality representations from …
introduces a novel kaleido strategy for fashion cross-modality representations from …
Image-text retrieval: A survey on recent research and development
In the past few years, cross-modal image-text retrieval (ITR) has experienced increased
interest in the research community due to its excellent research value and broad real-world …
interest in the research community due to its excellent research value and broad real-world …
M6: A chinese multimodal pretrainer
In this work, we construct the largest dataset for multimodal pretraining in Chinese, which
consists of over 1.9 TB images and 292GB texts that cover a wide range of domains. We …
consists of over 1.9 TB images and 292GB texts that cover a wide range of domains. We …
Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval
Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
conquer the cross-modal retrieval problem in the general domain. When it comes to E …
Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks
In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …
Cross-modal retrieval: a systematic review of methods and future directions
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …
methods struggle to meet the needs of users seeking access to data across various …
Vision-and-language pretrained models: A survey
Pretrained models have produced great success in both Computer Vision (CV) and Natural
Language Processing (NLP). This progress leads to learning joint representations of vision …
Language Processing (NLP). This progress leads to learning joint representations of vision …