Zoedepth: Zero-shot transfer by combining relative and metric depth
This paper tackles the problem of depth estimation from a single image. Existing work either
focuses on generalization performance disregarding metric scale, ie relative depth …
focuses on generalization performance disregarding metric scale, ie relative depth …
Multimae: Multi-modal multi-task masked autoencoders
We propose a pre-training strategy called Multi-modal Multi-task Masked Autoencoders
(MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can …
(MultiMAE). It differs from standard Masked Autoencoding in two key aspects: I) it can …
Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views
Virtual reality and augmented reality (XR) bring increasing demand for 3D content
generation. However, creating high-quality 3D content requires tedious work from a human …
generation. However, creating high-quality 3D content requires tedious work from a human …
Repurposing diffusion-based image generators for monocular depth estimation
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …
from a single image is geometrically ill-posed and requires scene understanding so it is not …
Unsupervised scale-consistent depth and ego-motion learning from monocular video
Recent work has shown that CNN-based depth and ego-motion estimators can be learned
using unlabelled monocular videos. However, the performance is limited by unidentified …
using unlabelled monocular videos. However, the performance is limited by unidentified …
Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image
We introduce GeoWizard, a new generative foundation model designed for estimating
geometric attributes, eg, depth and normals, from single images. While significant research …
geometric attributes, eg, depth and normals, from single images. While significant research …
Metric3d: Towards zero-shot metric 3d prediction from a single image
Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-
posedness of the single-image reconstruction problem, most well-established methods are …
posedness of the single-image reconstruction problem, most well-established methods are …
Sinnerf: Training neural radiance fields on complex scenes from a single image
Despite the rapid development of Neural Radiance Field (NeRF), the necessity of dense
covers largely prohibits its wider applications. While several recent works have attempted to …
covers largely prohibits its wider applications. While several recent works have attempted to …
P3depth: Monocular depth estimation with a piecewise planarity prior
Monocular depth estimation is vital for scene understanding and downstream tasks. We
focus on the supervised setup, in which ground-truth depth is available only at training time …
focus on the supervised setup, in which ground-truth depth is available only at training time …
Text2nerf: Text-driven 3d scene generation with neural radiance fields
Text-driven 3D scene generation is widely applicable to video gaming, film industry, and
metaverse applications that have a large demand for 3D scenes. However, existing text-to …
metaverse applications that have a large demand for 3D scenes. However, existing text-to …