Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey
Estimating depth from single RGB images and videos is of widespread interest due to its
applications in many areas, including autonomous driving, 3D reconstruction, digital …
applications in many areas, including autonomous driving, 3D reconstruction, digital …
Blink: Multimodal large language models can see but not perceive
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …
Fsgs: Real-time few-shot view synthesis using gaussian splatting
Novel view synthesis from limited observations remains a crucial and ongoing challenge. In
the realm of NeRF-based few-shot view synthesis, there is often a trade-off between the …
the realm of NeRF-based few-shot view synthesis, there is often a trade-off between the …
Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image
We introduce GeoWizard, a new generative foundation model designed for estimating
geometric attributes, eg, depth and normals, from single images. While significant research …
geometric attributes, eg, depth and normals, from single images. While significant research …
Sapiens: Foundation for human vision models
We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …
[HTML][HTML] Tracking and map** in medical computer vision: A review
As computer vision algorithms increase in capability, their applications in clinical systems
will become more pervasive. These applications include: diagnostics, such as colonoscopy …
will become more pervasive. These applications include: diagnostics, such as colonoscopy …
Depth pro: Sharp monocular metric depth in less than a second
A Bochkovskii, A Delaunoy, H Germain… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a foundation model for zero-shot metric monocular depth estimation. Our model,
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …
NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces
This paper reports on the NTIRE 2024 challenge on HR Depth From images of Specular and
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …
Coherentgs: Sparse novel view synthesis with coherent 3d gaussians
The field of 3D reconstruction from images has rapidly evolved in the past few years, first
with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian …
with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian …
Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …
vision and language tasks, particularly excelling in generating flexible photorealistic images …