Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Delivering arbitrary-modal semantic segmentation

J Zhang, R Liu, H Shi, K Yang, S Reiß… - Proceedings of the …, 2023 - openaccess.thecvf.com
Multimodal fusion can make semantic segmentation more robust. However, fusing an
arbitrary number of modalities remains underexplored. To delve into this problem, we create …

CMX: Cross-modal fusion for RGB-X semantic segmentation with transformers

J Zhang, H Liu, K Yang, X Hu, R Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Scene understanding based on image segmentation is a crucial component of autonomous
vehicles. Pixel-wise semantic segmentation of RGB images can be advanced by exploiting …

Cagroup3d: Class-aware grou** for 3d object detection on point clouds

H Wang, L Ding, S Dong, S Shi, A Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
We present a novel two-stage fully sparse convolutional 3D object detection framework,
named CAGroup3D. Our proposed method first generates some high-quality 3D proposals …

From sparse to soft mixtures of experts

J Puigcerver, C Riquelme, B Mustafa… - ar** perception systems capable of
making accurate, robust, and rapid decisions to interpret the driving environment effectively …