2dpass: 2d priors assisted semantic segmentation on lidar point clouds

X Yan, J Gao, C Zheng, C Zheng, R Zhang… - European conference on …, 2022 - Springer
As camera and LiDAR sensors capture complementary information in autonomous driving,
great efforts have been made to conduct semantic segmentation through multi-modality data …

Cross-domain correlation distillation for unsupervised domain adaptation in nighttime semantic segmentation

H Gao, J Guo, G Wang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The performance of nighttime semantic segmentation is restricted by the poor illumination
and a lack of pixel-wise annotation, which severely limit its application in autonomous …

X-trans2cap: Cross-modal knowledge transfer using transformer for 3d dense captioning

Z Yuan, X Yan, Y Liao, Y Guo, G Li… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract 3D dense captioning aims to describe individual objects by natural language in 3D
scenes, where 3D scenes are usually represented as RGB-D scans or point clouds …

MultiEMO: An attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations

T Shi, SL Huang - Proceedings of the 61st Annual Meeting of the …, 2023 - aclanthology.org
Abstract Emotion Recognition in Conversations (ERC) is an increasingly popular task in the
Natural Language Processing community, which seeks to achieve accurate emotion …

An information-theoretic approach to transferability in task transfer learning

Y Bao, Y Li, SL Huang, L Zhang… - … conference on image …, 2019 - ieeexplore.ieee.org
Task transfer learning is a popular technique in image processing applications that uses pre-
trained models to reduce the supervision cost of related tasks. An important question is to …

Evdistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation

L Wang, Y Chae, SH Yoon, TK Kim… - Proceedings of the …, 2021 - openaccess.thecvf.com
Event cameras sense per-pixel intensity changes and produce asynchronous event streams
with high dynamic range and less motion blur, showing advantages over the conventional …

Analysis of multimodal data fusion from an information theory perspective

Y Dai, Z Yan, J Cheng, X Duan, G Wang - Information Sciences, 2023 - Elsevier
Inspired by the McGurk effect, studies on multimodal data fusion start with audio-visual
speech recognition tasks. Multimodal data fusion research was not popular for a period of …

Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge

L Zhao, X Peng, Y Chen, M Kapadia… - Proceedings of the …, 2020 - openaccess.thecvf.com
Cross-modal knowledge distillation deals with transferring knowledge from a model trained
with superior modalities (Teacher) to another model trained with weak modalities (Student) …

Fusing modalities by multiplexed graph neural networks for outcome prediction from medical data and beyond

NS D'Souza, H Wang, A Giovannini… - Medical Image …, 2024 - Elsevier
With the emergence of multimodal electronic health records, the evidence for diseases,
events, or findings may be present across multiple modalities ranging from clinical to …

Dpnet: Dynamic poly-attention network for trustworthy multi-modal classification

X Zou, C Tang, X Zheng, Z Li, X He, S An… - Proceedings of the 31st …, 2023 - dl.acm.org
With advances in sensing technology, multi-modal data collected from different sources are
increasingly available. Multi-modal classification aims to integrate complementary …