Clusterfomer: clustering as a universal visual learner
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
Learning equivariant segmentation with instance-unique querying
Prevalent state-of-the-art instance segmentation methods fall into a query-based scheme, in
which instance masks are derived by querying the image feature using a set of instance …
which instance masks are derived by querying the image feature using a set of instance …
Tf-blender: Temporal feature blender for video object detection
Video objection detection is a challenging task because isolated video frames may
encounter appearance deterioration, which introduces great confusion for detection. One of …
encounter appearance deterioration, which introduces great confusion for detection. One of …
Clustseg: Clustering for universal segmentation
We present CLUSTSEG, a general, transformer-based framework that tackles different
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …
Physical attack on monocular depth estimation with optimal adversarial patches
Deep learning has substantially boosted the performance of Monocular Depth Estimation
(MDE), a critical component in fully vision-based autonomous driving (AD) systems (eg …
(MDE), a critical component in fully vision-based autonomous driving (AD) systems (eg …
Eigenplaces: Training viewpoint robust models for visual place recognition
Abstract Visual Place Recognition is a task that aims to predict the place of an image (called
query) based solely on its visual features. This is typically done through image retrieval …
query) based solely on its visual features. This is typically done through image retrieval …
Deep unsupervised part-whole relational visual saliency
Y Liu, X Dong, D Zhang, S Xu - Neurocomputing, 2024 - Elsevier
Abstract Deep Supervised Salient Object Detection (SSOD) excessively relies on large-
scale annotated pixel-level labels which consume intensive labour acquiring high quality …
scale annotated pixel-level labels which consume intensive labour acquiring high quality …
[PDF][PDF] Where is your place, visual place recognition?
Abstract Visual Place Recognition (VPR) is often characterized as being able to recognize
the same place despite significant changes in appearance and viewpoint. VPR is a key …
the same place despite significant changes in appearance and viewpoint. VPR is a key …
Deep visual geo-localization benchmark
In this paper, we propose a new open-source benchmarking framework for Visual Geo-
localization (VG) that allows to build, train, and test a wide range of commonly used …
localization (VG) that allows to build, train, and test a wide range of commonly used …
A survey on map-based localization techniques for autonomous vehicles
Autonomous vehicles integrate complex software stacks for realizing the necessary iterative
perception, planning, and action operations. One of the foundational layers of such stacks is …
perception, planning, and action operations. One of the foundational layers of such stacks is …