A survey on self-supervised learning: Algorithms, applications, and future trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

From google gemini to openai q*(q-star): A survey of resha** the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Point transformer v3: Simpler faster stronger

X Wu, L Jiang, PS Wang, Z Liu, X Liu… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
This paper is not motivated to seek innovation within the attention mechanism. Instead it
focuses on overcoming the existing trade-offs between accuracy and efficiency within the …

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C **ng, R Martín-Martín… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

Mvimgnet: A large-scale dataset of multi-view images

X Yu, M Xu, Y Zhang, H Liu, C Ye… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023‏ - proceedings.mlr.press
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …

3d-vista: Pre-trained transformer for 3d vision and text alignment

Z Zhu, X Ma, Y Chen, Z Deng… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - IEEE transactions on …, 2024‏ - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …