DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

D Danier, M Aygün, C Li, H Bilen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale pre-trained vision models are becoming increasingly prevalent, offering
expressive and generalizable visual representations that benefit various downstream tasks …

Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control

Z Gu, R Yan, J Lu, P Li, Z Dou, C Si, Z Dong… - arxiv preprint arxiv …, 2025 - arxiv.org
Diffusion models have demonstrated impressive performance in generating high-quality
videos from text prompts or images. However, precise control over the video generation …

Exploring Representation-Aligned Latent Space for Better Generation

W Xu, X Yue, Z Wang, Y Teng, W Zhang, X Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Generative models serve as powerful tools for modeling the real world, with mainstream
diffusion models, particularly those based on the latent diffusion model paradigm, achieving …

Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization

S Dong, S Wang, S Liu, L Cai, Q Fan, J Kannala… - arxiv preprint arxiv …, 2024 - arxiv.org
Visual localization aims to determine the camera pose of a query image relative to a
database of posed images. In recent years, deep neural networks that directly regress …

Relative Pose Estimation through Affine Corrections of Monocular Depth Priors

Y Yu, S Liu, R Pautrat, M Pollefeys… - arxiv preprint arxiv …, 2025 - arxiv.org
Monocular depth estimation (MDE) models have undergone significant advancements over
recent years. Many MDE models aim to predict affine-invariant relative depth from …

SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos

Y Liu, S Dong, S Wang, Y Yin, Y Yang, Q Fan… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce\textbf {SLAM3R}, a novel and effective monocular RGB SLAM
system for real-time and high-quality dense 3D reconstruction. SLAM3R provides an end-to …