idisc: Internal discretization for monocular depth estimation
Monocular depth estimation is fundamental for 3D scene understanding and downstream
applications. However, even under the supervised setup, it is still challenging and ill-posed …
applications. However, even under the supervised setup, it is still challenging and ill-posed …
Neural 3d scene reconstruction with the manhattan-world assumption
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view
images. Many previous works have shown impressive reconstruction results on textured …
images. Many previous works have shown impressive reconstruction results on textured …
Humor: 3d human motion model for robust pose estimation
We introduce HuMoR: a 3D Human Motion Model for Robust Estimation of temporal pose
and shape. Though substantial progress has been made in estimating 3D human motion …
and shape. Though substantial progress has been made in estimating 3D human motion …
P3depth: Monocular depth estimation with a piecewise planarity prior
Monocular depth estimation is vital for scene understanding and downstream tasks. We
focus on the supervised setup, in which ground-truth depth is available only at training time …
focus on the supervised setup, in which ground-truth depth is available only at training time …
Nddepth: Normal-distance assisted monocular depth estimation
Monocular depth estimation has drawn widespread attention from the vision community due
to its broad applications. In this paper, we propose a novel physics (geometry)-driven deep …
to its broad applications. In this paper, we propose a novel physics (geometry)-driven deep …
Structured3d: A large photo-realistic dataset for structured 3d modeling
Recently, there has been growing interest in develo** learning-based methods to detect
and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids …
and utilize salient semi-global or global structures, such as junctions, lines, planes, cuboids …
Affordancellm: Grounding affordance from vision language models
Affordance grounding refers to the task of finding the area of an object with which one can
interact. It is a fundamental but challenging task as a successful solution requires the …
interact. It is a fundamental but challenging task as a successful solution requires the …
Guiding monocular depth estimation using depth-attention volume
Recovering the scene depth from a single image is an ill-posed problem that requires
additional priors, often referred to as monocular depth cues, to disambiguate different 3D …
additional priors, often referred to as monocular depth cues, to disambiguate different 3D …
Novel view synthesis of dynamic scenes with globally coherent depths from a monocular camera
This paper presents a new method to synthesize an image from arbitrary views and times
given a collection of images of a dynamic scene. A key challenge for the novel view …
given a collection of images of a dynamic scene. A key challenge for the novel view …
Map-free visual relocalization: Metric pose relative to a single image
Can we relocalize in a scene represented by a single reference image? Standard visual
relocalization requires hundreds of images and scale calibration to build a scene-specific …
relocalization requires hundreds of images and scale calibration to build a scene-specific …