3D object detection for autonomous driving: A comprehensive survey

J Mao, S Shi, X Wang, H Li - International Journal of Computer Vision, 2023 - Springer
Autonomous driving, in recent years, has been receiving increasing attention for its potential
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …

Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe

H Li, C Sima, J Dai, W Wang, L Lu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …

Bevfusion: Multi-task multi-sensor fusion with unified bird's-eye view representation

Z Liu, H Tang, A Amini, X Yang, H Mao… - … on robotics and …, 2023 - ieeexplore.ieee.org
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with …

Transfusion: Robust lidar-camera fusion for 3d object detection with transformers

X Bai, Z Hu, X Zhu, Q Huang, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
LiDAR and camera are two important sensors for 3D object detection in autonomous driving.
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …

Futr3d: A unified sensor fusion framework for 3d detection

X Chen, T Zhang, Y Wang, Y Wang… - proceedings of the …, 2023 - openaccess.thecvf.com
Sensor fusion is an essential topic in many perception systems, such as autonomous driving
and robotics. Existing multi-modal 3D detection models usually involve customized designs …

Deepinteraction: 3d object detection via modality interaction

Z Yang, J Chen, Z Miao, W Li… - Advances in Neural …, 2022 - proceedings.neurips.cc
Existing top-performance 3D object detectors typically rely on the multi-modal fusion
strategy. This design is however fundamentally restricted due to overlooking the modality …

Bird's-eye-view scene graph for vision-language navigation

R Liu, X Wang, W Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Vision-language navigation (VLN), which entails an agent to navigate 3D
environments following human instructions, has shown great advances. However, current …

Deformable feature aggregation for dynamic multi-modal 3D object detection

Z Chen, Z Li, S Zhang, L Fang, Q Jiang… - European conference on …, 2022 - Springer
Point clouds and RGB images are two general perceptional sources in autonomous driving.
The former can provide accurate localization of objects, and the latter is denser and richer in …

Flatformer: Flattened window attention for efficient point cloud transformer

Z Liu, X Yang, H Tang, S Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Transformer, as an alternative to CNN, has been proven effective in many modalities (eg,
texts and images). For 3D point cloud transformers, existing efforts focus primarily on …

Gd-mae: generative decoder for mae pre-training on lidar point clouds

H Yang, T He, J Liu, H Chen, B Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite the tremendous progress of Masked Autoencoders (MAE) in develo** vision tasks
such as image and video, exploring MAE in large-scale 3D point clouds remains …