Self-supervised learning for videos: A survey
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
Unsupervised point cloud representation learning with deep neural networks: A survey
Point cloud data have been widely explored due to its superior accuracy and robustness
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …
Masked autoencoders for point cloud self-supervised learning
As a promising scheme of self-supervised learning, masked autoencoding has significantly
advanced natural language processing and computer vision. Inspired by this, we propose a …
advanced natural language processing and computer vision. Inspired by this, we propose a …
Clip2scene: Towards label-efficient 3d scene understanding by clip
Abstract Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D
zero-shot and few-shot learning. Despite the impressive performance in 2D, applying CLIP …
zero-shot and few-shot learning. Despite the impressive performance in 2D, applying CLIP …
Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training
Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …
language and 2D image transformers. However, it still remains an open question on how to …
Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …
Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding
M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …
classification, segmentation and detection is often laborious owing to the irregular structure …
3d-vista: Pre-trained transformer for 3d vision and text alignment
Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …
3D physical world with natural language, which is crucial for achieving embodied …
Language-grounded indoor 3d semantic segmentation in the wild
Recent advances in 3D semantic segmentation with deep neural networks have shown
remarkable success, with rapid performance increase on available datasets. However …
remarkable success, with rapid performance increase on available datasets. However …
ProposalContrast: Unsupervised Pre-training for LiDAR-Based 3D Object Detection
Existing approaches for unsupervised point cloud pre-training are constrained to either
scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose …
scene-level or point/voxel-level instance discrimination. Scene-level methods tend to lose …