Savi++: Towards end-to-end object-centric learning from real-world videos
The visual world can be parsimoniously characterized in terms of distinct entities with sparse
interactions. Discovering this compositional structure in dynamic visual scenes has proven …
interactions. Discovering this compositional structure in dynamic visual scenes has proven …
Reco: Retrieve and co-segment for zero-shot transfer
Semantic segmentation has a broad range of applications, but its real-world impact has
been significantly limited by the prohibitive annotation costs necessary to enable …
been significantly limited by the prohibitive annotation costs necessary to enable …
Pop-3d: Open-vocabulary 3d occupancy prediction from images
We describe an approach to predict open-vocabulary 3D semantic voxel occupancy map
from input 2D images with the objective of enabling 3D grounding, segmentation and …
from input 2D images with the objective of enabling 3D grounding, segmentation and …
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation
Sound can convey significant information for spatial reasoning in our daily lives. To endow
deep networks with such ability, we address the challenge of dense indoor prediction with …
deep networks with such ability, we address the challenge of dense indoor prediction with …
Namedmask: Distilling segmenters from complementary foundation models
The goal of this work is to segment and name regions of images without access to pixel-level
labels during training. To tackle this task, we construct segmenters by distilling the …
labels during training. To tackle this task, we construct segmenters by distilling the …
Zero-shot unsupervised transfer instance segmentation
Segmentation is a core computer vision competency, with applications spanning a broad
range of scientifically and economically valuable domains. To date, however, the prohibitive …
range of scientifically and economically valuable domains. To date, however, the prohibitive …
Unsupervised object localization in the era of self-supervised vits: A survey
The recent enthusiasm for open-world vision systems show the high interest of the
community to perform perception tasks outside of the closed-vocabulary benchmark setups …
community to perform perception tasks outside of the closed-vocabulary benchmark setups …
OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks
We introduce a self-supervised pretraining method called OccFeat for camera-only Bird's-
Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via …
Eye-View (BEV) segmentation networks. With OccFeat we pretrain a BEV network via …
Namedmask: Distilling segmenters from complementary foundation models
The goal of this work is to segment and name regions of images without access to pixel-level
labels during training. To tackle this task, we construct segmenters by distilling the …
labels during training. To tackle this task, we construct segmenters by distilling the …
Semantic segmentation of urban environments: Leveraging U-Net deep learning model for cityscape image analysis
Semantic segmentation of cityscapes via deep learning is an essential and game-changing
research topic that offers a more nuanced comprehension of urban landscapes. Deep …
research topic that offers a more nuanced comprehension of urban landscapes. Deep …