- Academic Search

Speichern Zitieren Zitiert von: 127 Ähnliche Artikel Alle 10 Versionen

Unsupervised point cloud representation learning with deep neural networks: A survey

A **ao, J Huang, D Guan, X Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Point cloud data have been widely explored due to its superior accuracy and robustness
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …

Speichern Zitieren Zitiert von: 531 Ähnliche Artikel Alle 6 Versionen

Masked autoencoders for point cloud self-supervised learning

Y Pang, W Wang, FEH Tay, W Liu, Y Tian… - European conference on …, 2022 - Springer

As a promising scheme of self-supervised learning, masked autoencoding has significantly
advanced natural language processing and computer vision. Inspired by this, we propose a …

Speichern Zitieren Zitiert von: 153 Ähnliche Artikel Alle 6 Versionen HTML-Version

Clip2scene: Towards label-efficient 3d scene understanding by clip

R Chen, Y Liu, L Kong, X Zhu, Y Ma… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) achieves promising results in 2D
zero-shot and few-shot learning. Despite the impressive performance in 2D, applying CLIP …

Speichern Zitieren Zitiert von: 265 Ähnliche Artikel Alle 6 Versionen HTML-Version

[PDF] neurips.cc

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc

Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

Speichern Zitieren Zitiert von: 143 Ähnliche Artikel Alle 5 Versionen HTML-Version

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Speichern Zitieren Zitiert von: 317 Ähnliche Artikel Alle 7 Versionen HTML-Version

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com

Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

Speichern Zitieren Zitiert von: 100 Ähnliche Artikel Alle 6 Versionen HTML-Version

3d-vista: Pre-trained transformer for 3d vision and text alignment

Z Zhu, X Ma, Y Chen, Z Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …

Speichern Zitieren Zitiert von: 165 Ähnliche Artikel Alle 4 Versionen

Language-grounded indoor 3d semantic segmentation in the wild

D Rozenberszki, O Litany, A Dai - European Conference on Computer …, 2022 - Springer

Recent advances in 3D semantic segmentation with deep neural networks have shown
remarkable success, with rapid performance increase on available datasets. However …