Deep learning for 3D object recognition: A survey
With the growing availability of extensive 3D datasets and the rapid progress in
computational power, deep learning (DL) has emerged as a highly promising approach for …
computational power, deep learning (DL) has emerged as a highly promising approach for …
Depthcrafter: Generating consistent long depth sequences for open-world videos
Despite significant advancements in monocular depth estimation for static images,
estimating video depth in the open world remains challenging, since open-world videos are …
estimating video depth in the open world remains challenging, since open-world videos are …
Monst3r: A simple approach for estimating geometry in the presence of motion
Estimating geometry from dynamic scenes, where objects move and deform over time,
remains a core challenge in computer vision. Current approaches often rely on multi-stage …
remains a core challenge in computer vision. Current approaches often rely on multi-stage …
Lotus: Diffusion-based visual foundation model for high-quality dense prediction
Leveraging the visual priors of pre-trained text-to-image diffusion models offers a promising
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
solution to enhance zero-shot generalization in dense prediction tasks. However, existing …
Unimatch v2: Pushing the limit of semi-supervised semantic segmentation
Semi-supervised semantic segmentation (SSS) aims at learning rich visual knowledge from
cheap unlabeled images to enhance semantic segmentation capability. Among recent …
cheap unlabeled images to enhance semantic segmentation capability. Among recent …
Dynamic gaussian marbles for novel view synthesis of casual monocular videos
Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting
clear strengths in efficiency, photometric quality, and compositional edibility. Following its …
clear strengths in efficiency, photometric quality, and compositional edibility. Following its …
Moge: Unlocking accurate monocular geometry estimation for open-domain images with optimal training supervision
We present MoGe, a powerful model for recovering 3D geometry from monocular open-
domain images. Given a single image, our model directly predicts a 3D point map of the …
domain images. Given a single image, our model directly predicts a 3D point map of the …
Compressed depth map super-resolution and restoration: AIM 2024 challenge results
The increasing demand for augmented reality (AR) and virtual reality (VR) applications
highlights the need for efficient depth information processing. Depth maps, essential for …
highlights the need for efficient depth information processing. Depth maps, essential for …
Unveiling deep shadows: A survey on image and video shadow detection, removal, and generation in the era of deep learning
Shadows are formed when light encounters obstacles, leading to areas of diminished
illumination. In computer vision, shadow detection, removal, and generation are crucial for …
illumination. In computer vision, shadow detection, removal, and generation are crucial for …
PixWizard: Versatile image-to-image visual assistant with open-language instructions
This paper presents a versatile image-to-image visual assistant, PixWizard, designed for
image generation, manipulation, and translation based on free-from language instructions …
image generation, manipulation, and translation based on free-from language instructions …