BEVFormer: Learning Bird's-Eye-View Representation From LiDAR-Camera Via Spatiotemporal Transformers

Z Li, W Wang, H Li, E **e, C Sima, T Lu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multi-modality fusion strategy is currently the de-facto most competitive solution for 3D
perception tasks. In this work, we present a new framework termed BEVFormer, which learns …

Planning-oriented autonomous driving

Y Hu, J Yang, L Chen, K Li, C Sima… - Proceedings of the …, 2023 - openaccess.thecvf.com
Modern autonomous driving system is characterized as modular tasks in sequential order,
ie, perception, prediction, and planning. In order to perform a wide diversity of tasks and …

Bytetrack: Multi-object tracking by associating every detection box

Y Zhang, P Sun, Y Jiang, D Yu, F Weng, Z Yuan… - European conference on …, 2022 - Springer
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in
videos. Most methods obtain identities by associating detection boxes whose scores are …

Exploring object-centric temporal modeling for efficient multi-view 3d object detection

S Wang, Y Liu, T Wang, Y Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …

Vip3d: End-to-end visual trajectory prediction via 3d agent queries

J Gu, C Hu, T Zhang, X Chen, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Perception and prediction are two separate modules in the existing autonomous driving
systems. They interact with each other via hand-picked features such as agent bounding …

Standing between past and future: Spatio-temporal modeling for multi-camera 3d multi-object tracking

Z Pang, J Li, P Tokmakov, D Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
This work proposes an end-to-end multi-camera 3D multi-object tracking (MOT) framework. It
emphasizes spatio-temporal continuity and integrates both past and future reasoning for …

Visual point cloud forecasting enables scalable autonomous driving

Z Yang, L Chen, Y Sun, H Li - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In contrast to extensive studies on general vision pre-training for scalable visual
autonomous driving remains seldom explored. Visual autonomous driving applications …

Sparse4d: Multi-view 3d object detection with sparse spatial-temporal fusion

X Lin, T Lin, Z Pei, L Huang, Z Su - arxiv preprint arxiv:2211.10581, 2022 - arxiv.org
Bird-eye-view (BEV) based methods have made great progress recently in multi-view 3D
detection task. Comparing with BEV based methods, sparse based methods lag behind in …

Exploring recurrent long-term temporal fusion for multi-view 3d perception

C Han, J Yang, J Sun, Z Ge, R Dong… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Long-term temporal fusion is a crucial but often overlooked technique in camera-based
Bird's-Eye-View (BEV) 3D perception. Existing methods are mostly in a parallel manner …

Panacea: Panoramic and controllable video generation for autonomous driving

Y Wen, Y Zhao, Y Liu, F Jia, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The field of autonomous driving increasingly demands high-quality annotated training data.
In this paper we propose Panacea an innovative approach to generate panoramic and …