Probing the 3d awareness of visual foundation models
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …
strong capabilities. Not only can recent models generalize to arbitrary images for their …
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
Abstract We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale
benchmark designed to advance the development and evaluation of pose estimation …
benchmark designed to advance the development and evaluation of pose estimation …
SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild
We present SHINOBI an end-to-end framework for the reconstruction of shape material and
illumination from object images captured with varying lighting pose and background. Inverse …
illumination from object images captured with varying lighting pose and background. Inverse …
Customizing Text-to-Image Diffusion with Camera Viewpoint Control
Model customization introduces new concepts to existing text-to-image models, enabling the
generation of the new concept in novel contexts. However, such methods lack accurate …
generation of the new concept in novel contexts. However, such methods lack accurate …
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors
Recent works in inverse rendering have shown promise in using multi-view images of an
object to recover shape, albedo, and materials. However, the recovered components often …
object to recover shape, albedo, and materials. However, the recovered components often …
Toward a holistic evaluation of robustness in clip models
W Tu, W Deng, T Gedeon - arxiv preprint arxiv:2410.01534, 2024 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) models have shown significant potential,
particularly in zero-shot classification across diverse distribution shifts. Building on existing …
particularly in zero-shot classification across diverse distribution shifts. Building on existing …
RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models
Agglomerative models have recently emerged as a powerful approach to training vision
foundation models, leveraging multi-teacher distillation from existing models such as CLIP …
foundation models, leveraging multi-teacher distillation from existing models such as CLIP …
3D Congealing: 3D-Aware Image Alignment in the Wild
We propose 3D Congealing, a novel problem of 3D-aware alignment for 2D images
capturing semantically similar objects. Given a collection of unlabeled Internet images, our …
capturing semantically similar objects. Given a collection of unlabeled Internet images, our …
PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments
We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark
designed to advance the development and evaluation of pose estimation methods in …
designed to advance the development and evaluation of pose estimation methods in …
Lfm-3d: Learnable feature matching across wide baselines using 3d signals
Finding localized correspondences across different images of the same object is crucial to
understand its geometry. In recent years, this problem has seen remarkable progress with …
understand its geometry. In recent years, this problem has seen remarkable progress with …