A review on methods and applications in multimodal deep learning

S Jabeen, X Li, MS Amin, O Bourahla, S Li… - ACM Transactions on …, 2023‏ - dl.acm.org
Deep Learning has implemented a wide range of applications and has become increasingly
popular in recent years. The goal of multimodal deep learning (MMDL) is to create models …

Advancing 3D point cloud understanding through deep transfer learning: A comprehensive survey

SS Sohail, Y Himeur, H Kheddar, A Amira, F Fadli… - Information …, 2024‏ - Elsevier
The 3D point cloud (3DPC) has significantly evolved and benefited from the advance of
deep learning (DL). However, the latter faces various issues, including the lack of data or …

Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023‏ - proceedings.mlr.press
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …

Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation, and instruction following

Z Guo, R Zhang, X Zhu, Y Tang, X Ma, J Han… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We introduce Point-Bind, a 3D multi-modality model aligning point clouds with 2D image,
language, audio, and video. Guided by ImageBind, we construct a joint embedding space …

Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?

R Dong, Z Qi, L Zhang, J Zhang, J Sun, Z Ge… - arxiv preprint arxiv …, 2022‏ - arxiv.org
The success of deep learning heavily relies on large-scale data with comprehensive labels,
which is more expensive and time-consuming to fetch in 3D compared to 2D images or …

Cross-modal retrieval: a systematic review of methods and future directions

T Wang, F Li, L Zhu, J Li, Z Zhang… - Proceedings of the …, 2025‏ - ieeexplore.ieee.org
With the exponential surge in diverse multimodal data, traditional unimodal retrieval
methods struggle to meet the needs of users seeking access to data across various …

RONO: robust discriminative learning with noisy labels for 2D-3D cross-modal retrieval

Y Feng, H Zhu, D Peng, X Peng… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
Recently, with the advent of Metaverse and AI Generated Content, cross-modal retrieval
becomes popular with a burst of 2D and 3D data. However, this problem is challenging …

Hypergraph-based multi-modal representation for open-set 3D object retrieval

Y Feng, S Ji, YS Liu, S Du, Q Dai… - IEEE Transactions on …, 2023‏ - ieeexplore.ieee.org
The traditional 3D object retrieval (3DOR) task is under the close-set setting, which assumes
the categories of objects in the retrieval stage are all seen in the training stage. Existing …

[HTML][HTML] Deep vision multimodal learning: Methodology, benchmark, and trend

W Chai, G Wang - Applied Sciences, 2022‏ - mdpi.com
Deep vision multimodal learning aims at combining deep visual representation learning with
other modalities, such as text, sound, and data collected from other sensors. With the fast …

Gssf: Generalized structural sparse function for deep cross-modal metric learning

H Diao, Y Zhang, S Gao, J Zhu… - IEEE Transactions on …, 2024‏ - ieeexplore.ieee.org
Cross-modal metric learning is a prominent research topic that bridges the semantic
heterogeneity between vision and language. Existing methods frequently utilize simple …