CNN architectures for geometric transformation-invariant feature representation in computer vision: a review

A Mumuni, F Mumuni - SN Computer Science, 2021 - Springer
One of the main challenges in machine vision relates to the problem of obtaining robust
representation of visual features that remain unaffected by geometric transformations. This …

Deep high-resolution representation learning for visual recognition

J Wang, K Sun, T Cheng, B Jiang… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
High-resolution representations are essential for position-sensitive vision problems, such as
human pose estimation, semantic segmentation, and object detection. Existing state-of-the …

Deep high-resolution representation learning for human pose estimation

K Sun, B **ao, D Liu, J Wang - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
In this paper, we are interested in the human pose estimation problem with a focus on
learning reliable high-resolution representations. Most existing methods recover high …

On translation invariance in cnns: Convolutional layers can exploit absolute spatial location

OS Kayhan, JC Gemert - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
In this paper we challenge the common assumption that convolutional layers in modern
CNNs are translation invariant. We show that CNNs can and will exploit the absolute spatial …

Why do deep convolutional networks generalize so poorly to small image transformations?

A Azulay, Y Weiss - Journal of Machine Learning Research, 2019 - jmlr.org
Abstract Convolutional Neural Networks (CNNs) are commonly assumed to be invariant to
small image transformations: either because of the convolutional architecture or because …

Text2light: Zero-shot text-driven hdr panorama generation

Z Chen, G Wang, Z Liu - ACM Transactions on Graphics (TOG), 2022 - dl.acm.org
High-quality HDRIs (High Dynamic Range Images), typically HDR panoramas, are one of
the most popular ways to create photorealistic lighting and 360-degree reflections of 3D …

Chaos is a ladder: A new theoretical understanding of contrastive learning via augmentation overlap

Y Wang, Q Zhang, Y Wang, J Yang, Z Lin - arxiv preprint arxiv …, 2022 - arxiv.org
Recently, contrastive learning has risen to be a promising approach for large-scale self-
supervised learning. However, theoretical understanding of how it works is still unclear. In …

Deviant: Depth equivariant network for monocular 3d object detection

A Kumar, G Brazil, E Corona, A Parchami… - European Conference on …, 2022 - Springer
Modern neural networks use building blocks such as convolutions that are equivariant to
arbitrary 2 D translations. However, these vanilla blocks are not equivariant to arbitrary 3 D …

Scale-aware fast R-CNN for pedestrian detection

J Li, X Liang, SM Shen, T Xu, J Feng… - IEEE transactions on …, 2017 - ieeexplore.ieee.org
In this paper, we consider the problem of pedestrian detection in natural scenes. Intuitively,
instances of pedestrians with different spatial scales may exhibit dramatically different …

Why self-attention? a targeted evaluation of neural machine translation architectures

G Tang, M Müller, A Rios, R Sennrich - arxiv preprint arxiv:1808.08946, 2018 - arxiv.org
Recently, non-recurrent architectures (convolutional, self-attentional) have outperformed
RNNs in neural machine translation. CNNs and self-attentional networks can connect distant …