Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Recent developments in monocular depth estimation methods enable high-quality depth
estimation of single-view images but fail to estimate consistent video depth across different …
estimation of single-view images but fail to estimate consistent video depth across different …
InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction
Realizing scaling laws in embodied AI has become a focus. However, previous work has
been scattered across diverse simulation platforms, with assets and models lacking unified …
been scattered across diverse simulation platforms, with assets and models lacking unified …
Survey on Monocular Metric Depth Estimation
J Zhang - arxiv preprint arxiv:2501.11841, 2025 - arxiv.org
Monocular Depth Estimation (MDE) is a fundamental computer vision task underpinning
applications such as spatial understanding, 3D reconstruction, and autonomous driving …
applications such as spatial understanding, 3D reconstruction, and autonomous driving …
Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos
Learning to understand dynamic 3D scenes from imagery is crucial for applications ranging
from robotics to scene reconstruction. Yet, unlike other problems where large-scale …
from robotics to scene reconstruction. Yet, unlike other problems where large-scale …
Local Policies Enable Zero-shot Long-horizon Manipulation
Sim2real for robotic manipulation is difficult due to the challenges of simulating complex
contacts and generating realistic task distributions. To tackle the latter problem, we introduce …
contacts and generating realistic task distributions. To tackle the latter problem, we introduce …
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
In this paper, we propose VistaDream a novel framework to reconstruct a 3D scene from a
single-view image. Recent diffusion models enable generating high-quality novel-view …
single-view image. Recent diffusion models enable generating high-quality novel-view …
DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models
H Kim, S Beak, H Joo - arxiv preprint arxiv:2501.08333, 2025 - arxiv.org
Understanding the ability of humans to use objects is crucial for AI to improve daily life.
Existing studies for learning such ability focus on human-object patterns (eg, contact, spatial …
Existing studies for learning such ability focus on human-object patterns (eg, contact, spatial …
FoundationStereo: Zero-Shot Stereo Matching
Tremendous progress has been made in deep stereo matching to excel on benchmark
datasets through per-domain fine-tuning. However, achieving strong zero-shot …
datasets through per-domain fine-tuning. However, achieving strong zero-shot …
MultiDepth: Multi-Sample Priors for Refining Monocular Metric Depth Estimations in Indoor Scenes
Monocular metric depth estimation (MMDE) is a crucial task to solve for indoor scene
reconstruction on edge devices. Despite this importance, existing models are sensitive to …
reconstruction on edge devices. Despite this importance, existing models are sensitive to …
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control
Diffusion models have demonstrated impressive performance in generating high-quality
videos from text prompts or images. However, precise control over the video generation …
videos from text prompts or images. However, precise control over the video generation …