Deep learning for 3D object recognition: A survey

AAM Muzahid, H Han, Y Zhang, D Li, Y Zhang… - Neurocomputing, 2024 - Elsevier
With the growing availability of extensive 3D datasets and the rapid progress in
computational power, deep learning (DL) has emerged as a highly promising approach for …

Depthcrafter: Generating consistent long depth sequences for open-world videos

W Hu, X Gao, X Li, S Zhao, X Cun, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite significant advancements in monocular depth estimation for static images,
estimating video depth in the open world remains challenging, since open-world videos are …

Monst3r: A simple approach for estimating geometry in the presence of motion

J Zhang, C Herrmann, J Hur, V Jampani… - arxiv preprint arxiv …, 2024 - arxiv.org
Estimating geometry from dynamic scenes, where objects move and deform over time,
remains a core challenge in computer vision. Current approaches often rely on multi-stage …

Lotus: Diffusion-based visual foundation model for high-quality dense prediction

J He, H Li, W Yin, Y Liang, L Li, K Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …

Unimatch v2: Pushing the limit of semi-supervised semantic segmentation

L Yang, Z Zhao, H Zhao - IEEE Transactions on Pattern …, 2025 - ieeexplore.ieee.org
Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from
cheap unlabeled images to enhance semantic segmentation capability. Among recent …

Dynamic gaussian marbles for novel view synthesis of casual monocular videos

C Stearns, A Harley, M Uy, F Dubost… - SIGGRAPH Asia 2024 …, 2024 - dl.acm.org
Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting
clear strengths in efficiency, photometric quality, and compositional edibility. Following its …

Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision

R Wang, S Xu, C Dai, J **ang, Y Deng, X Tong… - arxiv preprint arxiv …, 2024 - arxiv.org
We present MoGe, a powerful model for recovering 3D geometry from monocular open-
domain images. Given a single image, our model directly predicts a 3D point map of the …

Compressed depth map super-resolution and restoration: AIM 2024 challenge results

MV Conde, FA Vasluianu, J **ong, W Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
The increasing demand for augmented reality (AR) and virtual reality (VR) applications
highlights the need for efficient depth information processing. Depth maps, essential for …

Unveiling deep shadows: A survey on image and video shadow detection, removal, and generation in the era of deep learning

X Hu, Z **ng, T Wang, CW Fu, PA Heng - arxiv preprint arxiv:2409.02108, 2024 - arxiv.org
Shadows are formed when light encounters obstacles, leading to areas of diminished
illumination. In computer vision, shadow detection, removal, and generation are crucial for …

PixWizard: Versatile image-to-image visual assistant with open-language instructions

W Lin, X Wei, R Zhang, L Zhuo, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents a versatile image-to-image visual assistant, PixWizard, designed for
image generation, manipulation, and translation based on free-from language instructions …