Mixformerv2: Efficient fully transformer tracking
Transformer-based trackers have achieved strong accuracy on the standard benchmarks.
However, their efficiency remains an obstacle to practical deployment on both GPU and …
However, their efficiency remains an obstacle to practical deployment on both GPU and …
Artrackv2: Prompting autoregressive tracker where to look and how to describe
We present ARTrackV2 which integrates two pivotal aspects of tracking: determining where
to look (localization) and how to describe (appearance analysis) the target object across …
to look (localization) and how to describe (appearance analysis) the target object across …
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
The rich spatio-temporal information is crucial to capture the complicated target appearance
variations in visual tracking. However most top-performing tracking algorithms rely on many …
variations in visual tracking. However most top-performing tracking algorithms rely on many …
Adaptively bypassing vision transformer blocks for efficient visual tracking
Empowered by transformer-based models, visual tracking has advanced significantly.
However, the slow speed of current trackers limits their applicability on devices with …
However, the slow speed of current trackers limits their applicability on devices with …
A multi-modal transformer network for action detection
This paper proposes a novel multi-modal transformer network for detecting actions in
untrimmed videos. To enrich the action features, our transformer network utilizes a new multi …
untrimmed videos. To enrich the action features, our transformer network utilizes a new multi …
Autogenic language embedding for coherent point tracking
Point tracking is a challenging task in computer vision, aiming to establish point-wise
correspondence across long video sequences. Recent advancements have primarily …
correspondence across long video sequences. Recent advancements have primarily …
A transformer based visual tracker with restricted token interaction and knowledge distillation
N Liu, Y Zhang - Knowledge-Based Systems, 2025 - Elsevier
Recently, one-stream pipelines have made significant progress in visual object tracking
(VOT), where the template and search images interact in early stages. However, one-stream …
(VOT), where the template and search images interact in early stages. However, one-stream …
CTIFTrack: Continuous Temporal Information Fusion for object track
Z Zhang, Z Guo, L Wang, Y Li - Expert Systems with Applications, 2025 - Elsevier
In visual tracking tasks, researchers usually focus on increasing the complexity of the model
or only discretely focusing on the changes in the object itself to achieve accurate recognition …
or only discretely focusing on the changes in the object itself to achieve accurate recognition …
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Multimodal tracking has garnered widespread attention as a result of its ability to effectively
address the inherent limitations of traditional RGB tracking. However, existing multimodal …
address the inherent limitations of traditional RGB tracking. However, existing multimodal …
Masked Image Modeling: A Survey
In this work, we survey recent studies on masked image modeling (MIM), an approach that
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …