Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Deep vision multimodal learning: Methodology, benchmark, and trend
Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …
other modalities, such as text, sound, and data collected from other sensors. With the fast …
Provable dynamic fusion for low-quality multimodal data
The inherent challenge of multimodal fusion is to precisely capture the cross-modal
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
correlation and flexibly conduct cross-modal interaction. To fully release the value of each …
Modality competition: What makes joint training of multi-modal network fail in deep learning?(provably)
Despite the remarkable success of deep multi-modal learning in practice, it has not been
well-explained in theory. Recently, it has been observed that the best uni-modal network …
well-explained in theory. Recently, it has been observed that the best uni-modal network …
On uni-modal feature learning in supervised multi-modal learning
We abstract the features (ie learned representations) of multi-modal data into 1) uni-modal
features, which can be learned from uni-modal training, and 2) paired features, which can …
features, which can be learned from uni-modal training, and 2) paired features, which can …
Pmr: Prototypical modal rebalance for multimodal learning
Multimodal learning (MML) aims to jointly exploit the common priors of different modalities to
compensate for their inherent limitations. However, existing MML methods often optimize a …
compensate for their inherent limitations. However, existing MML methods often optimize a …
Revisiting disentanglement and fusion on modality and context in conversational multimodal emotion recognition
It has been a hot research topic to enable machines to understand human emotions in
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …
multimodal contexts under dialogue scenarios, which is tasked with multimodal emotion …
Gtp-4o: Modality-prompted heterogeneous graph learning for omni-modal biomedical representation
Recent advances in learning multi-modal representation have witnessed the success in
biomedical domains. While established techniques enable handling multi-modal …
biomedical domains. While established techniques enable handling multi-modal …
Enhancing multimodal cooperation via sample-level modality valuation
One primary topic of multimodal learning is to jointly incorporate heterogeneous information
from different modalities. However most models often suffer from unsatisfactory multimodal …
from different modalities. However most models often suffer from unsatisfactory multimodal …
Self-supervised intra-modal and cross-modal contrastive learning for point cloud understanding
Learning effective representations from unlabeled data is a challenging task for point cloud
understanding. As the human visual system can map concepts learned from 2D images to …
understanding. As the human visual system can map concepts learned from 2D images to …