Adaptformer: Adapting vision transformers for scalable visual recognition
Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …
recognition. A following scenario is to adapt a ViT to various image and video recognition …
Imagenet-21k pretraining for the masses
T Ridnik, E Ben-Baruch, A Noy… - ar** better image captioning
models, yet most of them rely on a separate object detector to extract regional features …
models, yet most of them rely on a separate object detector to extract regional features …
Ml-decoder: Scalable and versatile classification head
In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-
Decoder predicts the existence of class labels via queries, and enables better utilization of …
Decoder predicts the existence of class labels via queries, and enables better utilization of …
Re-labeling imagenet: from single to multi-labels, from global to localized labels
ImageNet has been the most popular image classification benchmark, but it is also the one
with a significant level of label noise. Recent studies have shown that many samples contain …
with a significant level of label noise. Recent studies have shown that many samples contain …
[PDF][PDF] MRN: A locally and globally mention-based reasoning network for document-level relation extraction
Document-level relation extraction aims to detect the relations within one document, which is
challenging since it requires complex reasoning using mentions, entities, local and global …
challenging since it requires complex reasoning using mentions, entities, local and global …
Cdul: Clip-driven unsupervised learning for multi-label image classification
This paper presents a CLIP-based unsupervised learning method for annotation-free multi-
label image classification, including three stages: initialization, training, and inference. At the …
label image classification, including three stages: initialization, training, and inference. At the …
Triplet attention and dual-pool contrastive learning for clinic-driven multi-label medical image classification
Multi-label classification (MLC) can attach multiple labels on single image, and has
achieved promising results on medical images. But existing MLC methods still face …
achieved promising results on medical images. But existing MLC methods still face …