[PDF][PDF] Crema: Multimodal compositional video reasoning via efficient modular adaptation and fusion
Despite impressive advancements in multimodal compositional reasoning approaches, they
are still limited in their flexibility and efficiency by processing fixed modality inputs while …
are still limited in their flexibility and efficiency by processing fixed modality inputs while …
[HTML][HTML] Comprehensive review on 3D point cloud segmentation in plants
H Song, W Wen, S Wu, X Guo - Artificial Intelligence in Agriculture, 2025 - Elsevier
Segmentation of three-dimensional (3D) point clouds is fundamental in comprehending
unstructured structural and morphological data. It plays a critical role in research related to …
unstructured structural and morphological data. It plays a critical role in research related to …
Point-to-pixel prompting for point cloud analysis with pre-trained image models
Nowadays, pre-training big models on large-scale datasets has achieved great success and
dominated many downstream tasks in natural language processing and 2D vision, while pre …
dominated many downstream tasks in natural language processing and 2D vision, while pre …
RGB-D Cube R-CNN: 3D Object Detection with Selective Modality Dropout
In this paper we create an RGB-D 3D object detector targeted at indoor robotics use cases
where one modality may be unavailable due to a specific sensor setup or a sensor failure …
where one modality may be unavailable due to a specific sensor setup or a sensor failure …
Weakly Supervised Point Cloud Semantic Segmentation via Artificial Oracle
Manual annotation of every point in a point cloud is a costly and labor-intensive process.
While weakly supervised point cloud semantic segmentation (WSPCSS) with sparse …
While weakly supervised point cloud semantic segmentation (WSPCSS) with sparse …
Masked Image Modeling: A Survey
In this work, we survey recent studies on masked image modeling (MIM), an approach that
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …
emerged as a powerful self-supervised learning technique in computer vision. The MIM task …
Infrastructure 3D Target detection based on multi-mode fusion for intelligent and connected vehicles
X Zhang, L He, R Lv, C **, Y Wang - IEEE Access, 2023 - ieeexplore.ieee.org
Autonomous driving technology faces significant safety challenges due to the lack of a
global perspective and the limitations of long-range perception capabilities. It is widely …
global perspective and the limitations of long-range perception capabilities. It is widely …