BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers
Multi-modality fusion strategy is currently the de-facto most competitive solution for 3D
perception tasks. In this work, we present a new framework termed BEVFormer, which learns …
perception tasks. In this work, we present a new framework termed BEVFormer, which learns …
Planning-oriented autonomous driving
Modern autonomous driving system is characterized as modular tasks in sequential order,
ie, perception, prediction, and planning. In order to perform a wide diversity of tasks and …
ie, perception, prediction, and planning. In order to perform a wide diversity of tasks and …
Bytetrack: Multi-object tracking by associating every detection box
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …
videos. Most methods obtain identities by associating detection boxes whose scores are …
Exploring object-centric temporal modeling for efficient multi-view 3d object detection
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …
Vip3d: End-to-end visual trajectory prediction via 3d agent queries
Perception and prediction are two separate modules in the existing autonomous driving
systems. They interact with each other via hand-picked features such as agent bounding …
systems. They interact with each other via hand-picked features such as agent bounding …
Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It
emphasizes spatio-temporal continuity and integrates both past and future reasoning for …
emphasizes spatio-temporal continuity and integrates both past and future reasoning for …
Visual point cloud forecasting enables scalable autonomous driving
In contrast to extensive studies on general vision pre-training for scalable visual
autonomous driving remains seldom explored. Visual autonomous driving applications …
autonomous driving remains seldom explored. Visual autonomous driving applications …
Sparse4d: Multi-view 3d object detection with sparse spatial-temporal fusion
Bird-eye-view (BEV) based methods have made great progress recently in multi-view 3D
detection task. Comparing with BEV based methods, sparse based methods lag behind in …
detection task. Comparing with BEV based methods, sparse based methods lag behind in …
Exploring recurrent long-term temporal fusion for multi-view 3d perception
Long-term temporal fusion is a crucial but often overlooked technique in camera-based
Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner …
Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner …
Panacea: Panoramic and controllable video generation for autonomous driving
The field of autonomous driving increasingly demands high-quality annotated training data.
In this paper we propose Panacea an innovative approach to generate panoramic and …
In this paper we propose Panacea an innovative approach to generate panoramic and …