Recent advances on loss functions in deep learning for computer vision

Y Tian, D Su, S Lauria, X Liu - Neurocomputing, 2022 - Elsevier
The loss function, also known as cost function, is used for training a neural network or other
machine learning models. Over the past decade, researchers have designed many loss …

Vision transformers for dense prediction: A survey

S Zuo, Y **ao, X Chang, X Wang - Knowledge-based systems, 2022 - Elsevier
Transformers have demonstrated impressive expressiveness and transfer capability in
computer vision fields. Dense prediction is a fundamental problem in computer vision that is …

Vision transformers need registers

T Darcet, M Oquab, J Mairal, P Bojanowski - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have recently emerged as a powerful tool for learning visual representations.
In this paper, we identify and characterize artifacts in feature maps of both supervised and …

Simple open-vocabulary object detection

M Minderer, A Gritsenko, A Stone, M Neumann… - European conference on …, 2022 - Springer
Combining simple architectures with large-scale pre-training has led to massive
improvements in image classification. For object detection, pre-training and scaling …

Centralized feature pyramid for object detection

Y Quan, D Zhang, L Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The visual feature pyramid has shown its superiority in both effectiveness and efficiency in a
variety of applications. However, current methods overly focus on inter-layer feature …

Conformer: Local features coupling global representations for visual recognition

Z Peng, W Huang, S Gu, L **e… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Within Convolutional Neural Network (CNN), the convolution operations are good
at extracting local features but experience difficulty to capture global representations. Within …

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

Grit: Faster and better image captioning transformer using dual visual features

VQ Nguyen, M Suganuma, T Okatani - European Conference on Computer …, 2022 - Springer
Current state-of-the-art methods for image captioning employ region-based features, as they
provide object-level information that is essential to describe the content of images; they are …

Vits for sits: Vision transformers for satellite image time series

M Tarasiou, E Chavez… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-
attentional model for general Satellite Image Time Series (SITS) processing based on the …

Cascade-DETR: delving into high-quality universal object detection

M Ye, L Ke, S Li, YW Tai, CK Tang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Object localization in general environments is a fundamental part of vision systems. While
dominating on the COCO benchmark, recent Transformer-based detection methods are not …