„Google“ mokslinčius

J Chen, Q Yu, X Shen, A Yuille… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …

Išsaugoti Cituoti Cituoja 13 Susiję straipsniai Visos 9 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation

M Hu, W Yin, C Zhang, Z Cai, X Long… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric
depth and surface normal estimation from single images, critical for accurate 3D recovery …

Išsaugoti Cituoti Cituoja 29 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Coconut: Modernizing coco segmentation

X Deng, Q Yu, P Wang, X Shen… - Proceedings of the …, 2024 - openaccess.thecvf.com

In recent decades the vision community has witnessed remarkable progress in visual
recognition partially owing to advancements in dataset benchmarks. Notably the established …

Išsaugoti Cituoti Cituoja 10 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Geminifusion: Efficient pixel-wise multimodal fusion for vision transformer

D Jia, J Guo, K Han, H Wu, C Zhang, C Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Cross-modal transformers have demonstrated superiority in various vision tasks by
effectively integrating different modalities. This paper first critiques prior token exchange …

Išsaugoti Cituoti Cituoja 9 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

InvPT++: Inverted pyramid multi-task transformer for visual scene understanding

H Ye, D Xu - IEEE transactions on pattern analysis and …, 2024 - ieeexplore.ieee.org

Multi-task scene understanding aims to design models that can simultaneously predict
several scene understanding tasks with one versatile model. Previous studies typically …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

HAPNet: Toward superior RGB-thermal scene parsing via hybrid, asymmetric, and progressive heterogeneous feature fusion

J Li, P Yun, Q Chen, R Fan - arxiv preprint arxiv:2404.03527, 2024 - arxiv.org

Data-fusion networks have shown significant promise for RGB-thermal scene parsing.
However, the majority of existing studies have relied on symmetric duplex encoders for …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

3d human reconstruction in the wild with synthetic data using generative models

Y Ge, W Wang, Y Chen, H Chen, C Shen - arxiv preprint arxiv:2403.11111, 2024 - arxiv.org

In this work, we show that synthetic data created by generative models is complementary to
computer graphics (CG) rendered data for achieving remarkable generalization …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 4 versijos HTML kopija

Uni-EPM: A Unified Extensible Perception Model Without Labeling Everything

Y Gao, S Mu, S Xu - IEEE Transactions on Intelligent …, 2024 - ieeexplore.ieee.org

Multi-task perception system to simultaneously perceive various kinds of objects is essential
for autonomous driving. Existing perception frameworks always rely on multi-labeled …

Išsaugoti Cituoti Susiję straipsniai Visos 3 versijos

Fine-tuned depth-augmented U-Net for enhanced semantic segmentation in indoor autonomous vision systems

HN Tran, TAN Le, NV Nguyen, NT Nguyen… - Journal of Real-Time …, 2025 - Springer

Recent technological advancements have significantly improved indoor autonomous vision
systems (IAVSs), underscoring the critical need to enhance their capability to interpret real …

Išsaugoti Cituoti Susiję straipsniai Visos 2 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks

A Quercia, E Yildiz, Z Cao, K Krajsek… - arxiv preprint arxiv …, 2025 - arxiv.org

Monocular depth estimation (MDE) is a challenging task in computer vision, often hindered
by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using …

Išsaugoti Cituoti Susiję straipsniai Visos 2 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Polymax: General dense prediction with mask transformer

Vitamin: Designing scalable vision models in the vision-language era

Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation

Coconut: Modernizing coco segmentation

Geminifusion: Efficient pixel-wise multimodal fusion for vision transformer

InvPT++: Inverted pyramid multi-task transformer for visual scene understanding

HAPNet: Toward superior RGB-thermal scene parsing via hybrid, asymmetric, and progressive heterogeneous feature fusion

3d human reconstruction in the wild with synthetic data using generative models

Uni-EPM: A Unified Extensible Perception Model Without Labeling Everything

Fine-tuned depth-augmented U-Net for enhanced semantic segmentation in indoor autonomous vision systems

Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks