Mixformerv2: Efficient fully transformer tracking

Y Cui, T Song, G Wu, L Wang - Advances in neural …, 2023 - proceedings.neurips.cc
Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
However, their efficiency remains an obstacle to practical deployment on both GPU and …

Artrackv2: Prompting autoregressive tracker where to look and how to describe

Y Bai, Z Zhao, Y Gong, X Wei - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We present ARTrackV2 which integrates two pivotal aspects of tracking: determining where
to look (localization) and how to describe (appearance analysis) the target object across …

Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers

J **e, B Zhong, Z Mo, S Zhang, L Shi… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rich spatio-temporal information is crucial to capture the complicated target appearance
variations in visual tracking. However most top-performing tracking algorithms rely on many …

Adaptively bypassing vision transformer blocks for efficient visual tracking

X Yang, D Zeng, X Wang, Y Wu, H Ye, Q Zhao, S Li - Pattern Recognition, 2025 - Elsevier
Empowered by transformer-based models, visual tracking has advanced significantly.
However, the slow speed of current trackers limits their applicability on devices with …

A multi-modal transformer network for action detection

M Korban, P Youngs, ST Acton - Pattern Recognition, 2023 - Elsevier
This paper proposes a novel multi-modal transformer network for detecting actions in
untrimmed videos. To enrich the action features, our transformer network utilizes a new multi …

Autogenic language embedding for coherent point tracking

Z Song, Y Tang, R Luo, L Ma, J Yu, YPP Chen… - Proceedings of the …, 2024 - dl.acm.org
Point tracking is a challenging task in computer vision, aiming to establish point-wise
correspondence across long video sequences. Recent advancements have primarily …

A transformer based visual tracker with restricted token interaction and knowledge distillation

N Liu, Y Zhang - Knowledge-Based Systems, 2025 - Elsevier
Recently, one-stream pipelines have made significant progress in visual object tracking
(VOT), where the template and search images interact in early stages. However, one-stream …

CTIFTrack: Continuous Temporal Information Fusion for object track

Z Zhang, Z Guo, L Wang, Y Li - Expert Systems with Applications, 2025 - Elsevier
In visual tracking tasks, researchers usually focus on increasing the complexity of the model
or only discretely focusing on the changes in the object itself to achieve accurate recognition …

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

X Hu, Y Tai, X Zhao, C Zhao, Z Zhang, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal tracking has garnered widespread attention as a result of its ability to effectively
address the inherent limitations of traditional RGB tracking. However, existing multimodal …

Masked Image Modeling: A Survey

V Hondru, FA Croitoru, S Minaee, RT Ionescu… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we survey recent studies on masked image modeling (MIM), an approach that
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …