One-2-3-45: Any single image to 3d mesh in 45 seconds without per-shape optimization
Single image 3D reconstruction is an important but challenging task that requires extensive
knowledge of our natural world. Many existing methods solve this problem by optimizing a …
knowledge of our natural world. Many existing methods solve this problem by optimizing a …
Objaverse-xl: A universe of 10m+ 3d objects
Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …
Pointllm: Empowering large language models to understand point clouds
The unprecedented advancements in Large Language Models (LLMs) have shown a
profound impact on natural language processing but are yet to fully embrace the realm of 3D …
profound impact on natural language processing but are yet to fully embrace the realm of 3D …
Ulip-2: Towards scalable multimodal pre-training for 3d understanding
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …
representation learning by aligning multimodal features across 3D shapes their 2D …
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …
research attention but two unresolved and critical issues demand further investigation. 1) …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
Towards open vocabulary learning: A survey
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …
advancements in various core tasks like segmentation, tracking, and detection. However …
Distilling large vision-language model with out-of-distribution generalizability
Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …
computational requirements make their deployment on resource-constrained devices and …
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …
physical environments, stands as a cornerstone in develo** embodied agents. In …
Michelangelo: Conditional 3d shape generation based on shape-image-text aligned latent representation
We present a novel alignment-before-generation approach to tackle the challenging task of
generating general 3D shapes based on 2D images or texts. Directly learning a conditional …
generating general 3D shapes based on 2D images or texts. Directly learning a conditional …