A survey on self-supervised learning: Algorithms, applications, and future trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

Mixed autoencoder for self-supervised visual representation learning

K Chen, Z Liu, L Hong, H Xu, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks
via randomly masking image patches and reconstruction. However, effective data …

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J **a… - arxiv preprint arxiv …, 2023 - arxiv.org
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

Hivit: A simpler and more efficient design of hierarchical vision transformer

X Zhang, Y Tian, L **e, W Huang, Q Dai… - The Eleventh …, 2023 - openreview.net
There has been a debate on the choice of plain vs. hierarchical vision transformers, where
researchers often believe that the former (eg, ViT) has a simpler design but the latter (eg …

Improving pixel-based mim by reducing wasted modeling capability

Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …