Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

Y You, K **ong, Z Yang, Z Huang, J Zhou, R Shi… - … on Computer Vision, 2024 - Springer
Abstract We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale
benchmark designed to advance the development and evaluation of pose estimation …

SHINOBI: Shape and Illumination using Neural Object Decomposition via BRDF Optimization In-the-wild

A Engelhardt, A Raj, M Boss, Y Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present SHINOBI an end-to-end framework for the reconstruction of shape material and
illumination from object images captured with varying lighting pose and background. Inverse …

Customizing Text-to-Image Diffusion with Camera Viewpoint Control

N Kumari, G Su, R Zhang, T Park, E Shechtman… - arxiv preprint arxiv …, 2024 - arxiv.org
Model customization introduces new concepts to existing text-to-image models, enabling the
generation of the new concept in novel contexts. However, such methods lack accurate …

MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors

Y Litman, O Patashnik, K Deng, A Agrawal… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent works in inverse rendering have shown promise in using multi-view images of an
object to recover shape, albedo, and materials. However, the recovered components often …

Toward a holistic evaluation of robustness in clip models

W Tu, W Deng, T Gedeon - arxiv preprint arxiv:2410.01534, 2024 - arxiv.org
Contrastive Language-Image Pre-training (CLIP) models have shown significant potential,
particularly in zero-shot classification across diverse distribution shifts. Building on existing …

RADIO Amplified: Improved Baselines for Agglomerative Vision Foundation Models

G Heinrich, M Ranzinger, Y Lu, J Kautz, A Tao… - arxiv preprint arxiv …, 2024 - arxiv.org
Agglomerative models have recently emerged as a powerful approach to training vision
foundation models, leveraging multi-teacher distillation from existing models such as CLIP …

3D Congealing: 3D-Aware Image Alignment in the Wild

Y Zhang, Z Li, A Raj, A Engelhardt, Y Li, T Hou… - … on Computer Vision, 2024 - Springer
We propose 3D Congealing, a novel problem of 3D-aware alignment for 2D images
capturing semantically similar objects. Given a collection of unlabeled Internet images, our …

PACE: A Large-Scale Dataset with Pose Annotations in Cluttered Environments

Y You, K **ong, Z Yang, Z Huang, J Zhou, R Shi… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce PACE (Pose Annotations in Cluttered Environments), a large-scale benchmark
designed to advance the development and evaluation of pose estimation methods in …

Lfm-3d: Learnable feature matching across wide baselines using 3d signals

A Karpur, G Perrotta, R Martin-Brualla… - … Conference on 3D …, 2024 - ieeexplore.ieee.org
Finding localized correspondences across different images of the same object is crucial to
understand its geometry. In recent years, this problem has seen remarkable progress with …