Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Joint feature learning and relation modeling for tracking: A one-stream framework

B Ye, H Chang, B Ma, S Shan, X Chen - European Conference on …, 2022 - Springer
The current popular two-stream, two-stage tracking framework extracts the template and the
search region features separately and then performs relation modeling, thus the extracted …

Seqtrack: Sequence to sequence learning for visual object tracking

X Chen, H Peng, D Wang, H Lu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we present a new sequence-to-sequence learning framework for visual
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …

Universal instance perception as object discovery and retrieval

B Yan, Y Jiang, J Wu, D Wang, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …

Swintrack: A simple and strong baseline for transformer tracking

L Lin, H Fan, Z Zhang, Y Xu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently Transformer has been largely explored in tracking and shown state-of-the-art
(SOTA) performance. However, existing efforts mainly focus on fusing and enhancing …

Autoregressive visual tracking

X Wei, Y Bai, Y Zheng, D Shi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We present ARTrack, an autoregressive framework for visual object tracking. ARTrack
tackles tracking as a coordinate sequence interpretation task that estimates object …

Backbone is all your need: A simplified architecture for visual object tracking

B Chen, P Li, L Bai, L Qiao, Q Shen, B Li, W Gan… - … on Computer Vision, 2022 - Springer
Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive
biases has recently drawn extensive interest. However, existing tracking approaches rely on …

Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks

Q Wu, T Yang, Z Liu, B Wu, Y Shan… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …

Learn to match: Automatic matching network design for visual tracking

Z Zhang, Y Liu, X Wang, B Li… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Siamese tracking has achieved groundbreaking performance in recent years, where the
essence is the efficient matching operator cross-correlation and its variants. Besides the …

Representation learning for visual object tracking by masked appearance transfer

H Zhao, D Wang, H Lu - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Visual representation plays an important role in visual object tracking. However, few works
study the tracking-specified representation learning method. Most trackers directly use …