Probing the 3d awareness of visual foundation models
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …
strong capabilities. Not only can recent models generalize to arbitrary images for their …
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
Abstract We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale
benchmark designed to advance the development and evaluation of pose estimation …
benchmark designed to advance the development and evaluation of pose estimation …
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
We present SHINOBI an end-to-end framework for the reconstruction of shape material and
illumination from object images captured with varying lighting pose and background. Inverse …
illumination from object images captured with varying lighting pose and background. Inverse …
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis
Q Zhao, S Tulsiani - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Inferring the 3D structure underlying a set of multi-view images typically requires solving two
co-dependent tasks--accurate 3D reconstruction requires precise camera poses, and …
co-dependent tasks--accurate 3D reconstruction requires precise camera poses, and …
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Model customization introduces new concepts to existing text-to-image models, enabling the
generation of the new concept in novel contexts. However, such methods lack accurate …
generation of the new concept in novel contexts. However, such methods lack accurate …
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors
Recent works in inverse rendering have shown promise in using multi-view images of an
object to recover shape, albedo, and materials. However, the recovered components often …
object to recover shape, albedo, and materials. However, the recovered components often …
Toward a holistic evaluation of robustness in clip models
W Tu, W Deng, T Gedeon - arxiv preprint arxiv:2410.01534, 2024 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) models have shown significant potential,
particularly in zero-shot classification across diverse distribution shifts. Building on existing …
particularly in zero-shot classification across diverse distribution shifts. Building on existing …
RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models
Agglomerative models have recently emerged as a powerful approach to training vision
foundation models, leveraging multi-teacher distillation from existing models such as CLIP …
foundation models, leveraging multi-teacher distillation from existing models such as CLIP …
3D Congealing: 3D-Aware Image Alignment in the Wild
We propose 3D Congealing, a novel problem of 3D-aware alignment for 2D images
capturing semantically similar objects. Given a collection of unlabeled Internet images, our …
capturing semantically similar objects. Given a collection of unlabeled Internet images, our …
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark
designed to advance the development and evaluation of pose estimation methods in …
designed to advance the development and evaluation of pose estimation methods in …