„Google“ mokslinčius

A Schmidt, O Mohareri, S DiMaio, MC Yip… - Medical Image …, 2024 - Elsevier

As computer vision algorithms increase in capability, their applications in clinical systems
will become more pervasive. These applications include: diagnostics, such as colonoscopy …

Išsaugoti Cituoti Cituoja 35 Susiję straipsniai Visos 5 versijos

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey

U Rajapaksha, F Sohel, H Laga, D Diepeveen… - ACM Computing …, 2024 - dl.acm.org

Estimating depth from single RGB images and videos is of widespread interest due to its
applications in many areas, including autonomous driving, 3D reconstruction, digital …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Blink: Multimodal large language models can see but not perceive

X Fu, Y Hu, B Li, Y Feng, H Wang, X Lin, D Roth… - … on Computer Vision, 2024 - Springer

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses
on core visual perception abilities not found in other evaluations. Most of the Blink tasks can …

Išsaugoti Cituoti Cituoja 104 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Spatialrgpt: Grounded spatial reasoning in vision-language models

AC Cheng, H Yin, Y Fu, Q Guo… - Advances in …, 2025 - proceedings.neurips.cc

Abstract Vision Language Models (VLMs) have demonstrated remarkable performance in
2D vision and language tasks. However, their ability to reason about spatial arrangements …

Išsaugoti Cituoti Cituoja 41 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Z Zhu, Z Fan, Y Jiang, Z Wang - European conference on computer vision, 2024 - Springer

Novel view synthesis from limited observations remains a crucial and ongoing challenge. In
the realm of NeRF-based few-shot view synthesis, there is often a trade-off between the …

Išsaugoti Cituoti Cituoja 103 Susiję straipsniai Visos 6 versijos

Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

X Fu, W Yin, M Hu, K Wang, Y Ma, P Tan… - … on Computer Vision, 2024 - Springer

We introduce GeoWizard, a new generative foundation model designed for estimating
geometric attributes, eg, depth and normals, from single images. While significant research …

Išsaugoti Cituoti Cituoja 63 Susiję straipsniai Visos 7 versijos

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models

Y Hu, W Shi, X Fu, D Roth… - Advances in …, 2025 - proceedings.neurips.cc

Humans draw to facilitate reasoning: we draw auxiliary lines when solving geometry
problems; we mark and circle when reasoning on maps; we use sketches to amplify our …

Išsaugoti Cituoti Cituoja 30 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Zero-shot image editing with reference imitation

X Chen, Y Feng, M Chen, Y Wang… - Advances in …, 2025 - proceedings.neurips.cc

Image editing serves as a practical yet challenging task considering the diverse demands
from users, where one of the hardest parts is to precisely describe how the edited image …

Išsaugoti Cituoti Cituoja 15 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Dreamscene4d: Dynamic multi-object scene generation from monocular videos

WH Chu, L Ke, K Fragkiadaki - Advances in Neural …, 2025 - proceedings.neurips.cc

View-predictive generative models provide strong priors for lifting object-centric images and
videos into 3D and 4D through rendering and score distillation objectives. A question then …

Išsaugoti Cituoti Cituoja 26 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation

M Hu, W Yin, C Zhang, Z Cai, X Long… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric
depth and surface normal estimation from single images, critical for accurate 3D recovery …

Išsaugoti Cituoti Cituoja 31 Susiję straipsniai Visos 9 versijos

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Depth anything: Unleashing the power of large-scale unlabeled data

[HTML][HTML] Tracking and map** in medical computer vision: A review

Deep learning-based depth estimation methods from monocular image and videos: A comprehensive survey

Blink: Multimodal large language models can see but not perceive

Spatialrgpt: Grounded spatial reasoning in vision-language models

Fsgs: Real-time few-shot view synthesis using gaussian splatting

Geowizard: Unleashing the diffusion priors for 3d geometry estimation from a single image

Visual sketchpad: Sketching as a visual chain of thought for multimodal language models

Zero-shot image editing with reference imitation

Dreamscene4d: Dynamic multi-object scene generation from monocular videos

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation