Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Efficientformer: Vision transformers at mobilenet speed
Abstract Vision Transformers (ViT) have shown rapid progress in computer vision tasks,
achieving promising results on various benchmarks. However, due to the massive number of …
achieving promising results on various benchmarks. However, due to the massive number of …
Scaling & shifting your features: A new baseline for efficient model tuning
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …
Deit iii: Revenge of the vit
Abstract A Vision Transformer (ViT) is a simple neural architecture amenable to serve
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
several computer vision tasks. It has limited built-in architectural priors, in contrast to more …
Surgical fine-tuning improves adaptation to distribution shifts
A common approach to transfer learning under distribution shift is to fine-tune the last few
layers of a pre-trained model, preserving learned features while also adapting to the new …
layers of a pre-trained model, preserving learned features while also adapting to the new …
One-peace: Exploring one general representation model toward unlimited modalities
In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …
Masked world models for visual control
Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient
robot learning from visual observations. Yet the current approaches typically train a single …
robot learning from visual observations. Yet the current approaches typically train a single …
No representation rules them all in category discovery
In this paper we tackle the problem of Generalized Category Discovery (GCD). Specifically,
given a dataset with labelled and unlabelled images, the task is to cluster all images in the …
given a dataset with labelled and unlabelled images, the task is to cluster all images in the …
Convmae: Masked convolution meets masked autoencoders
Vision Transformers (ViT) become widely-adopted architectures for various vision tasks.
Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer …
Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer …
Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving
Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, eg, via
range projection, is an effective and popular approach. These projection-based methods …
range projection, is an effective and popular approach. These projection-based methods …
Cat-seg: Cost aggregation for open-vocabulary semantic segmentation
Open-vocabulary semantic segmentation presents the challenge of labeling each pixel
within an image based on a wide range of text descriptions. In this work we introduce a …
within an image based on a wide range of text descriptions. In this work we introduce a …