Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Not all features matter: Enhancing few-shot clip with adaptive prior refinement

X Zhu, R Zhang, B He, A Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its
application to diverse downstream vision tasks. To improve its capacity on downstream …

Parameter is not all you need: Starting from non-parametric networks for 3d point cloud analysis

R Zhang, L Wang, Z Guo, Y Wang, P Gao, H Li… - arxiv preprint arxiv …, 2023 - arxiv.org
We present a Non-parametric Network for 3D point cloud analysis, Point-NN, which consists
of purely non-learnable components: farthest point sampling (FPS), k-nearest neighbors (k …

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding

Z Guo, Y Tang, R Zhang, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

Joint-mae: 2d-3d joint masked autoencoders for 3d point cloud pre-training

Z Guo, R Zhang, L Qiu, X Li, PA Heng - arxiv preprint arxiv:2302.14007, 2023 - arxiv.org
Masked Autoencoders (MAE) have shown promising performance in self-supervised
learning for both 2D and 3D computer vision. However, existing MAE-style methods can only …

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

X Zhu, R Zhang, B He, Z Guo, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
To reduce the reliance on large-scale datasets recent works in 3D segmentation resort to
few-shot learning. Current 3D few-shot segmentation methods first pre-train models …

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding with gpt and prototype guidance

Z Guo, Y Tang, R Zhang, D Wang, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

Point-PEFT: Parameter-efficient fine-tuning for 3D pre-trained models

Y Tang, R Zhang, Z Guo, X Ma, B Zhao… - Proceedings of the …, 2024 - ojs.aaai.org
The popularity of pre-trained large models has revolutionized downstream tasks across
diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost …

Tabr: Unlocking the power of retrieval-augmented tabular deep learning

Y Gorishniy, I Rubachev, N Kartashev… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep learning (DL) models for tabular data problems are receiving increasingly more
attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a …

[HTML][HTML] Point cloud semantic segmentation with adaptive spatial structure graph transformer

T Han, Y Chen, J Ma, X Liu, W Zhang, X Zhang… - International Journal of …, 2024 - Elsevier
With the rapid development of LiDAR and artificial intelligence technologies, 3D point cloud
semantic segmentation has become a highlight research topic. This technology is able to …