DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Large-scale pre-trained vision models are becoming increasingly prevalent, offering
expressive and generalizable visual representations that benefit various downstream tasks …
expressive and generalizable visual representations that benefit various downstream tasks …
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Diffusion models have demonstrated impressive performance in generating high-quality
videos from text prompts or images. However, precise control over the video generation …
videos from text prompts or images. However, precise control over the video generation …
Exploring Representation-Aligned Latent Space for Better Generation
Generative models serve as powerful tools for modeling the real world, with mainstream
diffusion models, particularly those based on the latent diffusion model paradigm, achieving …
diffusion models, particularly those based on the latent diffusion model paradigm, achieving …
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Visual localization aims to determine the camera pose of a query image relative to a
database of posed images. In recent years, deep neural networks that directly regress …
database of posed images. In recent years, deep neural networks that directly regress …
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
Monocular depth estimation (MDE) models have undergone significant advancements over
recent years. Many MDE models aim to predict affine-invariant relative depth from …
recent years. Many MDE models aim to predict affine-invariant relative depth from …
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
In this paper, we introduce\textbf {SLAM3R}, a novel and effective monocular RGB SLAM
system for real-time and high-quality dense 3D reconstruction. SLAM3R provides an end-to …
system for real-time and high-quality dense 3D reconstruction. SLAM3R provides an end-to …