Deep learning for monocular depth estimation: A review
Depth estimation is a classic task in computer vision, which is of great significance for many
applications such as augmented reality, target tracking and autonomous driving. Traditional …
applications such as augmented reality, target tracking and autonomous driving. Traditional …
Monocular depth estimation based on deep learning: An overview
Depth information is important for autonomous systems to perceive environments and
estimate their own state. Traditional depth estimation methods, like structure from motion …
estimate their own state. Traditional depth estimation methods, like structure from motion …
Depth anything: Unleashing the power of large-scale unlabeled data
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
Blink: Multimodal large language models can see but not perceive
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
Sparsenerf: Distilling depth ranking for few-shot novel view synthesis
Abstract Neural Radiance Field (NeRF) significantly degrades when only a limited number
of views are available. To complement the lack of 3D information, depth-based models, such …
of views are available. To complement the lack of 3D information, depth-based models, such …
Repurposing diffusion-based image generators for monocular depth estimation
Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …
from a single image is geometrically ill-posed and requires scene understanding so it is not …
Neurallift-360: Lifting an in-the-wild 2d photo to a 3d object with 360deg views
Virtual reality and augmented reality (XR) bring increasing demand for 3D content
generation. However, creating high-quality 3D content requires tedious work from a human …
generation. However, creating high-quality 3D content requires tedious work from a human …
Metric3d: Towards zero-shot metric 3d prediction from a single image
Reconstructing accurate 3D scenes from images is a long-standing vision task. Due to the ill-
posedness of the single-image reconstruction problem, most well-established methods are …
posedness of the single-image reconstruction problem, most well-established methods are …
Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3d scans
Computer vision now relies on data, but we know surprisingly little about what factors in the
data affect performance. We argue that this stems from the way data is collected. Designing …
data affect performance. We argue that this stems from the way data is collected. Designing …
The temporal opportunist: Self-supervised multi-frame monocular depth
Self-supervised monocular depth estimation networks are trained to predict scene depth
using nearby frames as a supervision signal during training. However, for many …
using nearby frames as a supervision signal during training. However, for many …