Google Академія

Зберегти Послатися Цитовано в 280 джерелах Пов’язані статті Кількість версій: 11 Показати у форматі HTML

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

Зберегти Послатися Цитовано в 519 джерелах Пов’язані статті Кількість версій: 11

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - European conference on …, 2022 - Springer

Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

Зберегти Послатися Цитовано в 187 джерелах Пов’язані статті Кількість версій: 7

Centralized feature pyramid for object detection

Y Quan, D Zhang, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a
variety of applications. However, current methods overly focus on inter-layer feature …

Зберегти Послатися Цитовано в 824 джерелах Пов’язані статті Кількість версій: 14 Показати у форматі HTML

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L **e… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

Зберегти Послатися Цитовано в 140 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

Зберегти Послатися Цитовано в 131 джерелах Пов’язані статті Кількість версій: 8

Grit: Faster and better image captioning transformer using dual visual features

VQ Nguyen, M Suganuma, T Okatani - European Conference on Computer …, 2022 - Springer

Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …

Зберегти Послатися Цитовано в 79 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

Vits for sits: Vision transformers for satellite image time series

M Tarasiou, E Chavez… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-
attentional model for general Satellite Image Time Series (SITS) processing based on the …