Large-scale multi-modal pre-trained models: A comprehensive survey

X Wang, G Chen, G Qian, P Gao, XY Wei… - Machine Intelligence …, 2023 - Springer
With the urgent demand for generalized deep models, many pre-trained big models are
proposed, such as bidirectional encoder representations (BERT), vision transformer (ViT) …

Seqtrack: Sequence to sequence learning for visual object tracking

X Chen, H Peng, D Wang, H Lu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we present a new sequence-to-sequence learning framework for visual
tracking, dubbed SeqTrack. It casts visual tracking as a sequence generation problem …

Autoregressive visual tracking

X Wei, Y Bai, Y Zheng, D Shi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We present ARTrack, an autoregressive framework for visual object tracking. ARTrack
tackles tracking as a coordinate sequence interpretation task that estimates object …

Universal instance perception as object discovery and retrieval

B Yan, Y Jiang, J Wu, D Wang, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …

Joint feature learning and relation modeling for tracking: A one-stream framework

B Ye, H Chang, B Ma, S Shan, X Chen - European conference on …, 2022 - Springer
The current popular two-stream, two-stage tracking framework extracts the template and the
search region features separately and then performs relation modeling, thus the extracted …

Onetracker: Unifying visual object tracking with foundation models and efficient tuning

L Hong, S Yan, R Zhang, W Li, X Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
Visual object tracking aims to localize the target object of each frame based on its initial
appearance in the first frame. Depending on the input modility tracking tasks can be divided …

Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks

Q Wu, T Yang, Z Liu, B Wu, Y Shan… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-
based downstream tasks, including visual object tracking (VOT) and video object …

Swintrack: A simple and strong baseline for transformer tracking

L Lin, H Fan, Z Zhang, Y Xu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently Transformer has been largely explored in tracking and shown state-of-the-art
(SOTA) performance. However, existing efforts mainly focus on fusing and enhancing …

Backbone is all your need: A simplified architecture for visual object tracking

B Chen, P Li, L Bai, L Qiao, Q Shen, B Li, W Gan… - European conference on …, 2022 - Springer
Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive
biases has recently drawn extensive interest. However, existing tracking approaches rely on …

Autoregressive queries for adaptive tracking with spatio-temporal transformers

J **e, B Zhong, Z Mo, S Zhang, L Shi… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rich spatio-temporal information is crucial to capture the complicated target appearance
variations in visual tracking. However most top-performing tracking algorithms rely on many …