Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Vitamin: Designing scalable vision models in the vision-language era
Recent breakthroughs in vision-language models (VLMs) start a new page in the vision
community. The VLMs provide stronger and more generalizable feature embeddings …
community. The VLMs provide stronger and more generalizable feature embeddings …
Metric3d v2: A versatile monocular geometric foundation model for zero-shot metric depth and surface normal estimation
We introduce Metric3D v2, a geometric foundation model designed for zero-shot metric
depth and surface normal estimation from single images, critical for accurate 3D recovery …
depth and surface normal estimation from single images, critical for accurate 3D recovery …
Coconut: Modernizing coco segmentation
In recent decades the vision community has witnessed remarkable progress in visual
recognition partially owing to advancements in dataset benchmarks. Notably the established …
recognition partially owing to advancements in dataset benchmarks. Notably the established …
Geminifusion: Efficient pixel-wise multimodal fusion for vision transformer
Cross-modal transformers have demonstrated superiority in various vision tasks by
effectively integrating different modalities. This paper first critiques prior token exchange …
effectively integrating different modalities. This paper first critiques prior token exchange …
InvPT++: Inverted pyramid multi-task transformer for visual scene understanding
Multi-task scene understanding aims to design models that can simultaneously predict
several scene understanding tasks with one versatile model. Previous studies typically …
several scene understanding tasks with one versatile model. Previous studies typically …
HAPNet: Toward superior RGB-thermal scene parsing via hybrid, asymmetric, and progressive heterogeneous feature fusion
Data-fusion networks have shown significant promise for RGB-thermal scene parsing.
However, the majority of existing studies have relied on symmetric duplex encoders for …
However, the majority of existing studies have relied on symmetric duplex encoders for …
3d human reconstruction in the wild with synthetic data using generative models
In this work, we show that synthetic data created by generative models is complementary to
computer graphics (CG) rendered data for achieving remarkable generalization …
computer graphics (CG) rendered data for achieving remarkable generalization …
Uni-EPM: A Unified Extensible Perception Model Without Labeling Everything
Multi-task perception system to simultaneously perceive various kinds of objects is essential
for autonomous driving. Existing perception frameworks always rely on multi-labeled …
for autonomous driving. Existing perception frameworks always rely on multi-labeled …
Fine-tuned depth-augmented U-Net for enhanced semantic segmentation in indoor autonomous vision systems
Recent technological advancements have significantly improved indoor autonomous vision
systems (IAVSs), underscoring the critical need to enhance their capability to interpret real …
systems (IAVSs), underscoring the critical need to enhance their capability to interpret real …
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Monocular depth estimation (MDE) is a challenging task in computer vision, often hindered
by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using …
by the cost and scarcity of high-quality labeled datasets. We tackle this challenge using …