Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …
and drawing extensive attention both from industry and academia. Conventional …
3D object detection for autonomous driving: A comprehensive survey
Autonomous driving, in recent years, has been receiving increasing attention for its potential
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …
Ulip: Learning a unified representation of language, images, and point clouds for 3d understanding
The recognition capabilities of current state-of-the-art 3D models are limited by datasets with
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …
a small number of annotated data and a pre-defined set of categories. In its 2D counterpart …
Transfusion: Robust lidar-camera fusion for 3d object detection with transformers
LiDAR and camera are two important sensors for 3D object detection in autonomous driving.
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …
Despite the increasing popularity of sensor fusion in this field, the robustness against inferior …
Stratified transformer for 3d point cloud segmentation
Abstract 3D point cloud segmentation has made tremendous progress in recent years. Most
current methods focus on aggregating local features, but fail to directly model long-range …
current methods focus on aggregating local features, but fail to directly model long-range …
Ulip-2: Towards scalable multimodal pre-training for 3d understanding
Recent advancements in multimodal pre-training have shown promising efficacy in 3D
representation learning by aligning multimodal features across 3D shapes their 2D …
representation learning by aligning multimodal features across 3D shapes their 2D …
Detrs with hybrid matching
One-to-one set matching is a key design for DETR to establish its end-to-end capability, so
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …
that object detection does not require a hand-crafted NMS (non-maximum suppression) to …
Rethinking network design and local geometry in point cloud: A simple residual MLP framework
Point cloud analysis is challenging due to irregularity and unordered data structure. To
capture the 3D geometries, prior works mainly rely on exploring sophisticated local …
capture the 3D geometries, prior works mainly rely on exploring sophisticated local …
A survey of visual transformers
Transformer, an attention-based encoder–decoder model, has already revolutionized the
field of natural language processing (NLP). Inspired by such significant achievements, some …
field of natural language processing (NLP). Inspired by such significant achievements, some …
Surface representation for point clouds
Most prior work represents the shapes of point clouds by coordinates. However, it is
insufficient to describe the local geometry directly. In this paper, we present RepSurf …
insufficient to describe the local geometry directly. In this paper, we present RepSurf …