Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Sigmoid loss for language image pre-training
We propose a simple pairwise sigmoid loss for image-text pre-training. Unlike standard
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …
contrastive learning with softmax normalization, the sigmoid loss operates solely on image …
Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
What matters when building vision-language models?
The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …
large language models and vision transformers. Despite the abundance of literature on this …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Obelics: An open web-scale filtered dataset of interleaved image-text documents
Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …
outperform models trained on image-text pairs on various multimodal benchmarks …
Objaverse: A universe of annotated 3d objects
Massive data corpora like WebText, Wikipedia, Conceptual Captions, WebImageText, and
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …
LAION have propelled recent dramatic progress in AI. Large neural models trained on such …
Minicpm-v: A gpt-4v level mllm on your phone
The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally
reshaped the landscape of AI research and industry, shedding light on a promising path …
reshaped the landscape of AI research and industry, shedding light on a promising path …