Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arxiv preprint arxiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Deep vision multimodal learning: Methodology, benchmark, and trend

W Chai, G Wang - Applied Sciences, 2022 - mdpi.com
Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …

Provable dynamic fusion for low-quality multimodal data

Q Zhang, H Wu, C Zhang, Q Hu, H Fu… - International …, 2023 - proceedings.mlr.press
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …

Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)

Y Huang, J Lin, C Zhou, H Yang… - … conference on machine …, 2022 - proceedings.mlr.press
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …

On uni-modal feature learning in supervised multi-modal learning

C Du, J Teng, T Li, Y Liu, T Yuan… - International …, 2023 - proceedings.mlr.press
We abstract the features (ie learned representations) of multi-modal data into 1) uni-modal
features, which can be learned from uni-modal training, and 2) paired features, which can …

Pmr: Prototypical modal rebalance for multimodal learning

Y Fan, W Xu, H Wang, J Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to
compensate for their inherent limitations. However, existing MML methods often optimize a …

Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition

B Li, H Fei, L Liao, Y Zhao, C Teng, TS Chua… - Proceedings of the 31st …, 2023 - dl.acm.org
It has been a hot research topic to enable machines to understand human emotions in
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …

Gtp-4o: Modality-prompted heterogeneous graph learning for omni-modal biomedical representation

C Li, X Liu, C Wang, Y Liu, W Yu, J Shao… - European conference on …, 2024 - Springer
Recent advances in learning multi-modal representation have witnessed the success in
biomedical domains. While established techniques enable handling multi-modal …

Enhancing multimodal cooperation via sample-level modality valuation

Y Wei, R Feng, Z Wang, D Hu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
One primary topic of multimodal learning is to jointly incorporate heterogeneous information
from different modalities. However most models often suffer from unsatisfactory multimodal …

Self-supervised intra-modal and cross-modal contrastive learning for point cloud understanding

Y Wu, J Liu, M Gong, P Gong, X Fan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Learning effective representations from unlabeled data is a challenging task for point cloud
understanding. As the human visual system can map concepts learned from 2D images to …