Bringing masked autoencoders explicit contrastive properties for point cloud self-supervised learning
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved
performance comparable to CL for traditional convolutional backbones. However, in 3D …
performance comparable to CL for traditional convolutional backbones. However, in 3D …