Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, J Dong, C Li, D Su, C Chu… - arxiv preprint arxiv …, 2024 - arxiv.org
In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

Deep learning-based 3D point cloud classification: A systematic survey and outlook

H Zhang, C Wang, S Tian, B Lu, L Zhang, X Ning, X Bai - Displays, 2023 - Elsevier
In recent years, point cloud representation has become one of the research hotspots in the
field of computer vision, and has been widely used in many fields, such as autonomous …

Point transformer v3: Simpler faster stronger

X Wu, L Jiang, PS Wang, Z Liu, X Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper is not motivated to seek innovation within the attention mechanism. Instead it
focuses on overcoming the existing trade-offs between accuracy and efficiency within the …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C **ng, R Martín-Martín… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

Pointllm: Empowering large language models to understand point clouds

R Xu, X Wang, T Wang, Y Chen, J Pang… - European Conference on …, 2024 - Springer
The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …

Pointnext: Revisiting pointnet++ with improved training and scaling strategies

G Qian, Y Li, H Peng, J Mai… - Advances in neural …, 2022 - proceedings.neurips.cc
PointNet++ is one of the most influential neural architectures for point cloud understanding.
Although the accuracy of PointNet++ has been largely surpassed by recent networks such …

Meshgpt: Generating triangle meshes with decoder-only transformers

Y Siddiqui, A Alliegro, A Artemov… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce MeshGPT a new approach for generating triangle meshes that reflects the
compactness typical of artist-created meshes in contrast to dense triangle meshes extracted …

Mvimgnet: A large-scale dataset of multi-view images

X Yu, M Xu, Y Zhang, H Liu, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …