MDNet: Mamba-effective diffusion-distillation network for RGB-thermal urban dense prediction

W Zhou, H Wu, Q Jiang - … on Circuits and Systems for Video …, 2024 - ieeexplore.ieee.org
In recent years, significant progress has been achieved in urban dense prediction tasks,
particularly with advancements in deep learning models and novel architectures that …

AFter: Attention-based fusion router for RGBT tracking

A Lu, W Wang, C Li, J Tang, B Luo - arxiv preprint arxiv:2405.02717, 2024 - arxiv.org
Multi-modal feature fusion as a core investigative component of RGBT tracking emerges
numerous fusion studies in recent years. However, existing RGBT tracking methods widely …

Mambavt: Spatio-temporal contextual modeling for robust rgb-t tracking

S Lai, C Liu, J Zhu, B Kang, Y Liu, D Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing RGB-T tracking algorithms have made remarkable progress by leveraging the
global interaction capability and extensive pre-trained models of the Transformer …

Rgbt tracking via frequency-aware feature enhancement and unidirectional mixed attention

J Zhang, J Yang, Z Liu, J Wang - Neurocomputing, 2025 - Elsevier
RGBT object tracking is widely used due to the complementary nature of RGB and TIR
modalities. However, RGBT trackers based on Transformer or CNN face significant …

Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking

X Hu, Y Tai, X Zhao, C Zhao, Z Zhang, J Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal tracking has garnered widespread attention as a result of its ability to effectively
address the inherent limitations of traditional RGB tracking. However, existing multimodal …

Top-down cross-modal guidance for robust rgb-t tracking

L Chen, B Zhong, Q Liang, Y Zheng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Most RGB-T trackers heavily rely on bottom-up attention and thus overlook top-down cross-
modal guidance for learning target features. Consequently, the discriminative power of the …

Exploring Multi-modal Spatial-Temporal Contexts for High-performance RGB-T Tracking

T Zhang, Q Jiao, Q Zhang, J Han - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
In RGB-T tracking, there exist rich spatial relationships between the target and backgrounds
within multi-modal data as well as sound consistencies of spatial relationships among …

Mambaevt: Event stream based visual object tracking using state space model

X Wang, S Wang, X Wang, Z Zhao, L Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Event camera-based visual tracking has drawn more and more attention in recent years due
to the unique imaging principle and advantages of low energy consumption, high dynamic …

Visual Object Tracking across Diverse Data Modalities: A Review

M Wang, T Ma, S **n, X Hou, J **ng, G Dai… - arxiv preprint arxiv …, 2024 - arxiv.org
Visual Object Tracking (VOT) is an attractive and significant research area in computer
vision, which aims to recognize and track specific targets in video sequences where the …

Towards a generalist and blind RGB-X tracker

Y Tan, Z Wu, Y Fu, Z Zhou, G Sun, C Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
With the emergence of a single large model capable of successfully solving a multitude of
tasks in NLP, there has been growing research interest in achieving similar goals in …