3D object detection for autonomous driving: A comprehensive survey

J Mao, S Shi, X Wang, H Li - International Journal of Computer Vision, 2023 - Springer
Autonomous driving, in recent years, has been receiving increasing attention for its potential
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …

Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe

H Li, C Sima, J Dai, W Wang, L Lu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …

Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding

L Xue, M Gao, C **ng, R Martín-Martín… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …

Transfusion: Robust lidar-camera fusion for 3d object detection with transformers

X Bai, Z Hu, X Zhu, Q Huang, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
LiDAR and camera are two important sensors for 3D object detection in autonomous driving.
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …

Stratified transformer for 3d point cloud segmentation

X Lai, J Liu, L Jiang, L Wang, H Zhao… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract 3D point cloud segmentation has made tremendous progress in recent years. Most
current methods focus on aggregating local features, but fail to directly model long-range …

Ulip-2: Towards scalable multimodal pre-training for 3d understanding

L Xue, N Yu, S Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …

Mask3d: Mask transformer for 3d semantic instance segmentation

J Schult, F Engelmann, A Hermans… - … on Robotics and …, 2023 - ieeexplore.ieee.org
Modern 3D semantic instance segmentation approaches predominantly rely on specialized
voting mechanisms followed by carefully designed geometric clustering techniques. Building …

Detrs with hybrid matching

D Jia, Y Yuan, H He, X Wu, H Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …

Octformer: Octree-based transformers for 3d point clouds

PS Wang - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org
We propose octree-based transformers, named OctFormer, for 3D point cloud learning.
OctFormer can not only serve as a general and effective backbone for 3D point cloud …

Dsvt: Dynamic sparse voxel transformer with rotated sets

H Wang, C Shi, S Shi, M Lei, S Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is
a fundamental problem in 3D perception. Compared with the customized sparse …