Self-supervised learning for videos: A survey
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
Speaker recognition based on deep learning: An overview
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …
learning has dramatically revolutionized speaker recognition. However, there is lack of …
Lightglue: Local feature matching at light speed
We introduce LightGlue, a deep neural network that learns to match local features across
images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse …
images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
R2former: Unified retrieval and reranking transformer for place recognition
Abstract Visual Place Recognition (VPR) estimates the location of query images by matching
them with images in a reference database. Conventional methods generally adopt …
them with images in a reference database. Conventional methods generally adopt …
Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition
Abstract Visual Place Recognition is a challenging task for robotics and autonomous
systems, which must deal with the twin problems of appearance and viewpoint change in an …
systems, which must deal with the twin problems of appearance and viewpoint change in an …
Object-centric learning with slot attention
Learning object-centric representations of complex scenes is a promising step towards
enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep …
enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep …
Rethinking visual geo-localization for large-scale applications
Visual Geo-localization (VG) is the task of estimating the position where a given photo was
taken by comparing it with a large database of images of known locations. To investigate …
taken by comparing it with a large database of images of known locations. To investigate …
Clip2video: Mastering video-text retrieval via image clip
We present CLIP2Video network to transfer the image-language pre-training model to video-
text retrieval in an end-to-end manner. Leading approaches in the domain of video-and …
text retrieval in an end-to-end manner. Leading approaches in the domain of video-and …
Deep learning for 3d point clouds: A survey
Point cloud learning has lately attracted increasing attention due to its wide applications in
many areas, such as computer vision, autonomous driving, and robotics. As a dominating …
many areas, such as computer vision, autonomous driving, and robotics. As a dominating …