Deep learning for instance retrieval: A survey

W Chen, Y Liu, W Wang, EM Bakker… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
In recent years a vast amount of visual content has been generated and shared from many
fields, such as social media platforms, medical imaging, and robotics. This abundance of …

SIFT meets CNN: A decade survey of instance retrieval

L Zheng, Y Yang, Q Tian - IEEE transactions on pattern …, 2017 - ieeexplore.ieee.org
In the early days, content-based image retrieval (CBIR) was studied with global features.
Since 2003, image retrieval based on local descriptors (de facto SIFT) has been extensively …

Diffusion art or digital forgery? investigating data replication in diffusion models

G Somepalli, V Singla, M Goldblum… - Proceedings of the …, 2023 - openaccess.thecvf.com
Cutting-edge diffusion models produce images with high quality and customizability,
enabling them to be used for commercial art and graphic design purposes. But do diffusion …

Mixvpr: Feature mixing for visual place recognition

A Ali-Bey, B Chaib-Draa… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous
driving as well as other computer vision tasks. It refers to the process of identifying a place …

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks

M Goldblum, H Souri, R Ni, M Shu… - Advances in …, 2023 - proceedings.neurips.cc
Neural network based computer vision systems are typically built on a backbone, a
pretrained or randomly initialized feature extractor. Several years ago, the default option was …

Rethinking visual geo-localization for large-scale applications

G Berton, C Masone, B Caputo - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Visual Geo-localization (VG) is the task of estimating the position where a given photo was
taken by comparing it with a large database of images of known locations. To investigate …

Cost aggregation with 4d convolutional swin transformer for few-shot segmentation

S Hong, S Cho, J Nam, S Lin, S Kim - European Conference on Computer …, 2022 - Springer
This paper presents a novel cost aggregation network, called Volumetric Aggregation with
Transformers (VAT), for few-shot segmentation. The use of transformers can benefit …

An empirical study of remote sensing pretraining

D Wang, J Zhang, B Du, GS **a… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Deep learning has largely reshaped remote sensing (RS) research for aerial image
understanding and made a great success. Nevertheless, most of the existing deep models …

Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi

K Ying, F Meng, J Wang, Z Li, H Lin, Y Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …

Image matching from handcrafted to deep features: A survey

J Ma, X Jiang, A Fan, J Jiang, J Yan - International Journal of Computer …, 2021 - Springer
As a fundamental and critical task in various visual applications, image matching can identify
then correspond the same or similar structure/content from two or more images. Over the …