Vlp: A survey on vision-language pre-training

FL Chen, DZ Zhang, ML Han, XY Chen, J Shi… - Machine Intelligence …, 2023 - Springer
In the past few years, the emergence of pre-training models has brought uni-modal fields
such as computer vision (CV) and natural language processing (NLP) to a new era …

Nerf: Neural radiance field in 3d vision, a comprehensive review

K Gao, Y Gao, H He, D Lu, L Xu, J Li - arxiv preprint arxiv:2210.00379, 2022 - arxiv.org
Neural Radiance Field (NeRF), a new novel view synthesis with implicit scene
representation has taken the field of Computer Vision by storm. As a novel view synthesis …

Objaverse-xl: A universe of 10m+ 3d objects

M Deitke, R Liu, M Wallingford, H Ngo… - Advances in …, 2024 - proceedings.neurips.cc
Natural language processing and 2D vision models have attained remarkable proficiency on
many tasks primarily by escalating the scale of training data. However, 3D vision tasks have …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Generative novel view synthesis with 3d-aware diffusion models

ER Chan, K Nagano, MA Chan… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a diffusion-based model for 3D-aware generative novel view synthesis from as
few as a single input image. Our model samples from the distribution of possible renderings …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Occ3d: A large-scale 3d occupancy prediction benchmark for autonomous driving

X Tian, T Jiang, L Yun, Y Mao, H Yang… - Advances in …, 2024 - proceedings.neurips.cc
Robotic perception requires the modeling of both 3D geometry and semantics. Existing
methods typically focus on estimating 3D bounding boxes, neglecting finer geometric details …

Openscene: 3d scene understanding with open vocabularies

S Peng, K Genova, C Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a
model for a single task with supervision. We propose OpenScene, an alternative approach …

Scannet++: A high-fidelity dataset of 3d indoor scenes

C Yeshwanth, YC Liu, M Nießner… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ScanNet++, a large-scale dataset that couples together capture of high-quality
and commodity-level geometry and color of indoor scenes. Each scene is captured with a …

Mvimgnet: A large-scale dataset of multi-view images

X Yu, M Xu, Y Zhang, H Liu, C Ye… - Proceedings of the …, 2023 - openaccess.thecvf.com
Being data-driven is one of the most iconic properties of deep learning algorithms. The birth
of ImageNet drives a remarkable trend of" learning from large-scale data" in computer vision …