Sdstrack: Self-distillation symmetric adapter learning for multi-modal visual object tracking

X Hou, J **ng, Y Qian, Y Guo, S **n… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Visual Object Tracking (VOT) has recently gained significant attention
due to its robustness. Early research focused on fully fine-tuning RGB-based trackers which …

Single-model and any-modality for video object tracking

Z Wu, J Zheng, X Ren, FA Vasluianu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the realm of video object tracking auxiliary modalities such as depth thermal or event data
have emerged as valuable assets to complement the RGB trackers. In practice most existing …

Cmda: Cross-modality domain adaptation for nighttime semantic segmentation

R **a, C Zhao, M Zheng, Z Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most nighttime semantic segmentation studies are based on domain adaptation approaches
and image input. However, limited by the low dynamic range of conventional cameras …

Dformer: Rethinking rgbd representation learning for semantic segmentation

B Yin, X Zhang, Z Li, L Liu, MM Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
We present DFormer, a novel RGB-D pretraining framework to learn transferable
representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) …

Sigma: Siamese mamba network for multi-modal semantic segmentation

Z Wan, P Zhang, Y Wang, S Yong, S Stepputtis… - arxiv preprint arxiv …, 2024 - arxiv.org
Multi-modal semantic segmentation significantly enhances AI agents' perception and scene
understanding, especially under adverse conditions like low-light or overexposed …

Polymax: General dense prediction with mask transformer

X Yang, L Yuan, K Wilber, A Sharma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Dense prediction tasks, such as semantic segmentation, depth estimation, and surface
normal prediction, can be easily formulated as per-pixel classification (discrete outputs) or …

Caltech aerial rgb-thermal dataset in the wild

C Lee, M Anderson, N Ranganathan, X Zuo… - … on Computer Vision, 2024 - Springer
We present the first publicly-available RGB-thermal dataset designed for aerial robotics
operating in natural environments. Our dataset captures a variety of terrain across the United …

Multimodal feature-guided pre-training for RGB-T perception

J Ouyang, P **, Q Wang - IEEE Journal of Selected Topics in …, 2024 - ieeexplore.ieee.org
Wide-range multiscale object detection for multispectral scene perception from a drone
perspective is challenging. Previous RGB-T perception methods directly use backbone …

Rethinking reverse distillation for multi-modal anomaly detection

Z Gu, J Zhang, L Liu, X Chen, J Peng, Z Gan… - Proceedings of the …, 2024 - ojs.aaai.org
In recent years, there has been significant progress in employing color images for anomaly
detection in industrial scenarios, but it is insufficient for identifying anomalies that are …

UTFNet: Uncertainty-guided trustworthy fusion network for RGB-thermal semantic segmentation

Q Wang, C Yin, H Song, T Shen… - IEEE Geoscience and …, 2023 - ieeexplore.ieee.org
In real-world scenarios, the information quality provided by RGB and thermal (RGB-T)
sensors often varies across samples. This variation will negatively impact the performance of …