Languagebind: Extending video-language pretraining to n-modality by language-based semantic alignment
B Zhu, B Lin, M Ning, Y Yan, J Cui, HF Wang… - ar**Former: Learning cross-modal feature map** for visible-to-infrared image translation
H Wang, N Li, H Zhao, Y Wen, Y Su… - Proceedings of the 32nd …, 2024 - dl.acm.org
Due to the limitations of infrared image acquisition conditions, many essential tasks currently
rely on visible images as the main source of training data. However, single-modal data …
rely on visible images as the main source of training data. However, single-modal data …