Large-scale multi-modal pre-trained models: A comprehensive survey
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …
Joint feature learning and relation modeling for tracking: A one-stream framework
The current popular two-stream, two-stage tracking framework extracts the template and the
search region features separately and then performs relation modeling, thus the extracted …
search region features separately and then performs relation modeling, thus the extracted …
Seqtrack: Sequence to sequence learning for visual object tracking
In this paper, we present a new sequence-to-sequence learning framework for visual
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Swintrack: A simple and strong baseline for transformer tracking
Recently Transformer has been largely explored in tracking and shown state-of-the-art
(SOTA) performance. However, existing efforts mainly focus on fusing and enhancing …
(SOTA) performance. However, existing efforts mainly focus on fusing and enhancing …
Autoregressive visual tracking
We present ARTrack, an autoregressive framework for visual object tracking. ARTrack
tackles tracking as a coordinate sequence interpretation task that estimates object …
tackles tracking as a coordinate sequence interpretation task that estimates object …
Backbone is all your need: A simplified architecture for visual object tracking
Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive
biases has recently drawn extensive interest. However, existing tracking approaches rely on …
biases has recently drawn extensive interest. However, existing tracking approaches rely on …
Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks
In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …
based downstream tasks, including visual object tracking (VOT) and video object …
Learn to match: Automatic matching network design for visual tracking
Siamese tracking has achieved groundbreaking performance in recent years, where the
essence is the efficient matching operator cross-correlation and its variants. Besides the …
essence is the efficient matching operator cross-correlation and its variants. Besides the …
Representation learning for visual object tracking by masked appearance transfer
Visual representation plays an important role in visual object tracking. However, few works
study the tracking-specified representation learning method. Most trackers directly use …
study the tracking-specified representation learning method. Most trackers directly use …