Vlp: A survey on vision-language pre-training
In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …
such as computer vision (CV) and natural language processing (NLP) to a new era …
Nerf: Neural radiance field in 3d vision, a comprehensive review
Neural Radiance Field (NeRF), a new novel view synthesis with implicit scene
representation has taken the field of Computer Vision by storm. As a novel view synthesis …
representation has taken the field of Computer Vision by storm. As a novel view synthesis …
Objaverse-xl: A universe of 10m+ 3d objects
Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …
Emergent correspondence from image diffusion
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …
this paper, we show that correspondence emerges in image diffusion models without any …
Generative novel view synthesis with 3d-aware diffusion models
We present a diffusion-based model for 3D-aware generative novel view synthesis from as
few as a single input image. Our model samples from the distribution of possible renderings …
few as a single input image. Our model samples from the distribution of possible renderings …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving
Robotic perception requires the modeling of both 3D geometry and semantics. Existing
methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details …
methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details …
Openscene: 3d scene understanding with open vocabularies
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a
model for a single task with supervision. We propose OpenScene, an alternative approach …
model for a single task with supervision. We propose OpenScene, an alternative approach …
Scannet++: A high-fidelity dataset of 3d indoor scenes
We present ScanNet++, a large-scale dataset that couples together capture of high-quality
and commodity-level geometry and color of indoor scenes. Each scene is captured with a …
and commodity-level geometry and color of indoor scenes. Each scene is captured with a …
Mvimgnet: A large-scale dataset of multi-view images
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …