Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Revolutionizing digital pathology with the power of generative artificial intelligence and foundation models
Digital pathology has transformed the traditional pathology practice of analyzing tissue
under a microscope into a computer vision workflow. Whole-slide imaging allows …
under a microscope into a computer vision workflow. Whole-slide imaging allows …
Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Yolo-world: Real-time open-vocabulary object detection
Abstract The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However their reliance on predefined and trained object …
as efficient and practical tools. However their reliance on predefined and trained object …
Recognize anything: A strong image tagging model
Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …
image tagging. RAM makes a substantial step for foundation models in computer vision …
Self-regulating prompts: Foundational model adaptation without forgetting
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …
Maple: Multi-modal prompt learning
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …
Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching
Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects
from novel categories beyond the base categories on which the detector is trained. Recent …
from novel categories beyond the base categories on which the detector is trained. Recent …