- Academic Search

Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey

U Rajapaksha, F Sohel, H Laga, D Diepeveen… - ACM Computing …, 2024 - dl.acm.org

Estimating depth from single RGB images and videos is of widespread interest due to its
applications in many areas, including autonomous driving, 3D reconstruction, digital …

Enregistrer Citer Cité 7 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Blink: Multimodal large language models can see but not perceive

X Fu, Y Hu, B Li, Y Feng, H Wang, X Lin, D Roth… - … on Computer Vision, 2024 - Springer

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …

Enregistrer Citer Cité 94 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Z Zhu, Z Fan, Y Jiang, Z Wang - European conference on computer vision, 2024 - Springer

Novel view synthesis from limited observations remains a crucial and ongoing challenge. In
the realm of NeRF-based few-shot view synthesis, there is often a trade-off between the …

Enregistrer Citer Cité 94 fois Autres articles Les 2 versions Free GPT-4

Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

X Fu, W Yin, M Hu, K Wang, Y Ma, P Tan… - … on Computer Vision, 2024 - Springer

We introduce GeoWizard, a new generative foundation model designed for estimating
geometric attributes, eg, depth and normals, from single images. While significant research …

Enregistrer Citer Cité 58 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer

We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

Enregistrer Citer Cité 29 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Tracking and map** in medical computer vision: A review

A Schmidt, O Mohareri, S DiMaio, MC Yip… - Medical Image …, 2024 - Elsevier

As computer vision algorithms increase in capability, their applications in clinical systems
will become more pervasive. These applications include: diagnostics, such as colonoscopy …

Enregistrer Citer Cité 30 fois Autres articles Les 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Depth pro: Sharp monocular metric depth in less than a second

A Bochkovskii, A Delaunoy, H Germain… - arxiv preprint arxiv …, 2024 - arxiv.org

We present a foundation model for zero-shot metric monocular depth estimation. Our model,
Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and high …

Enregistrer Citer Cité 31 fois Autres articles Version HTML

[Free GPT-4]

[PDF] thecvf.com

NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces

PZ Ramirez, F Tosi, L Di Stefano… - Proceedings of the …, 2024 - openaccess.thecvf.com

This paper reports on the NTIRE 2024 challenge on HR Depth From images of Specular and
Transparent surfaces held in conjunction with the New Trends in Image Restoration and …

Enregistrer Citer Cité 30 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] pkwyx.com

Coherentgs: Sparse novel view synthesis with coherent 3d gaussians

A Paliwal, W Ye, J **ong, D Kotovenko… - … on Computer Vision, 2024 - Springer

The field of 3D reconstruction from images has rapidly evolved in the past few years, first
with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian …

Enregistrer Citer Cité 17 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining

D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Lumina-mGPT, a family of multimodal autoregressive models capable of various
vision and language tasks, particularly excelling in generating flexible photorealistic images …

Enregistrer Citer Cité 26 fois Autres articles Version HTML

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey

Blink: Multimodal large language models can see but not perceive

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

Sapiens: Foundation for human vision models

[HTML][HTML] Tracking and map** in medical computer vision: A review

Depth pro: Sharp monocular metric depth in less than a second

NTIRE 2024 challenge on HR depth from images of specular and transparent surfaces

Coherentgs: Sparse novel view synthesis with coherent 3d gaussians

Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining