Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vision transformers need registers
Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …
In this paper, we identify and characterize artifacts in feature maps of both supervised and …
Cut and learn for unsupervised object detection and instance segmentation
Abstract We propose Cut-and-LEaRn (CutLER), a simple approach for training
unsupervised object detection and segmentation models. We leverage the property of self …
unsupervised object detection and segmentation models. We leverage the property of self …
Scaling vision transformers to gigapixel images via hierarchical self-supervised learning
Abstract Vision Transformers (ViTs) and their multi-scale and hierarchical variations have
been successful at capturing image representations but their use has been generally …
been successful at capturing image representations but their use has been generally …
Transformer-based visual segmentation: A survey
X Li, H Ding, H Yuan, W Zhang, J Pang… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Neural feature fusion fields: 3d distillation of self-supervised 2d image representations
We present Neural Feature Fusion Fields (N3F),\a method that improves dense 2D image
feature extractors when the latter are applied to the analysis of multiple images …
feature extractors when the latter are applied to the analysis of multiple images …
Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization
Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …
that involve decomposing an image into semantically-meaningful segments without any …
[PDF][PDF] Deep vit features as dense visual descriptors
We study the use of deep features extracted from a pretrained Vision Transformer (ViT) as
dense visual descriptors. We observe and empirically demonstrate that such features, when …
dense visual descriptors. We observe and empirically demonstrate that such features, when …
Bridging the gap to real-world object-centric learning
M Seitzer, M Horn, A Zadaianchuk, D Zietlow… - arxiv preprint arxiv …, 2022 - arxiv.org
Humans naturally decompose their environment into entities at the appropriate level of
abstraction to act in the world. Allowing machine learning algorithms to derive this …
abstraction to act in the world. Allowing machine learning algorithms to derive this …
Freesolo: Learning to segment objects without annotations
Instance segmentation is a fundamental vision task that aims to recognize and segment
each object in an image. However, it requires costly annotations such as bounding boxes …
each object in an image. However, it requires costly annotations such as bounding boxes …
Self-supervised learning of object parts for semantic segmentation
Progress in self-supervised learning has brought strong general image representation
learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks …
learning methods. Yet so far, it has mostly focused on image-level learning. In turn, tasks …