Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey

U Rajapaksha, F Sohel, H Laga, D Diepeveen… - ACM Computing …, 2024 - dl.acm.org
Estimating depth from single RGB images and videos is of widespread interest due to its
applications in many areas, including autonomous driving, 3D reconstruction, digital …

Blink: Multimodal large language models can see but not perceive

X Fu, Y Hu, B Li, Y Feng, H Wang, X Lin, D Roth… - … on Computer Vision, 2024 - Springer
We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Z Zhu, Z Fan, Y Jiang, Z Wang - European conference on computer vision, 2024 - Springer
Novel view synthesis from limited observations remains a crucial and ongoing challenge. In
the realm of NeRF-based few-shot view synthesis, there is often a trade-off between the …

Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

X Fu, W Yin, M Hu, K Wang, Y Ma, P Tan… - … on Computer Vision, 2024 - Springer
We introduce GeoWizard, a new generative foundation model designed for estimating
geometric attributes, eg, depth and normals, from single images. While significant research …

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer
We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

[HTML][HTML] Tracking and map** in medical computer vision: A review

A Schmidt, O Mohareri, S DiMaio, MC Yip… - Medical Image …, 2024 - Elsevier
As computer vision algorithms increase in capability, their applications in clinical systems
will become more pervasive. These applications include: diagnostics, such as colonoscopy …

Depth pro: Sharp monocular metric depth in less than a second

A Bochkovskii, A Delaunoy, H Germain… - arxiv preprint arxiv …, 2024 - arxiv.org
We present a foundation model for zero-shot metric monocular depth estimation. Our model,
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …

NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces

PZ Ramirez, F Tosi, L Di Stefano… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper reports on the NTIRE 2024 challenge on HR Depth From images of Specular and
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …

Coherentgs: Sparse novel view synthesis with coherent 3d gaussians

A Paliwal, W Ye, J **ong, D Kotovenko… - … on Computer Vision, 2024 - Springer
The field of 3D reconstruction from images has rapidly evolved in the past few years, first
with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian …

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …