A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
A comprehensive survey on applications of transformers for deep learning tasks
Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …
mechanism to capture contextual relationships within sequential data. Unlike traditional …
Voxelnext: Fully sparse voxelnet for 3d object detection and tracking
Abstract 3D object detectors usually rely on hand-crafted proxies, eg, anchors or centers,
and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be …
and translate well-studied 2D frameworks to 3D. Thus, sparse voxel features need to be …
Multimodal virtual point 3d detection
Lidar-based sensing drives current autonomous vehicles. Despite rapid progress, current
Lidar sensors still lag two decades behind traditional color cameras in terms of resolution …
Lidar sensors still lag two decades behind traditional color cameras in terms of resolution …
Logonet: Towards accurate 3d object detection with local-to-global cross-modal fusion
LiDAR-camera fusion methods have shown impressive performance in 3D object detection.
Recent advanced multi-modal methods mainly perform global fusion, where image features …
Recent advanced multi-modal methods mainly perform global fusion, where image features …
Centerformer: Center-based transformer for 3d object detection
Query-based transformer has shown great potential in constructing long-range attention in
many image-domain tasks, but has rarely been considered in LiDAR-based 3D object …
many image-domain tasks, but has rarely been considered in LiDAR-based 3D object …
Dsvt: Dynamic sparse voxel transformer with rotated sets
Designing an efficient yet deployment-friendly 3D backbone to handle sparse point clouds is
a fundamental problem in 3D perception. Compared with the customized sparse …
a fundamental problem in 3D perception. Compared with the customized sparse …
Persformer: 3d lane detection via perspective transformer and the openlane benchmark
Methods for 3D lane detection have been recently proposed to address the issue of
inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.) …
inaccurate lane layouts in many autonomous driving scenarios (uphill/downhill, bump, etc.) …
3D object detection for autonomous driving: A comprehensive survey
Autonomous driving, in recent years, has been receiving increasing attention for its potential
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …
to relieve drivers' burdens and improve the safety of driving. In modern autonomous driving …
Flatformer: Flattened window attention for efficient point cloud transformer
Transformer, as an alternative to CNN, has been proven effective in many modalities (eg,
texts and images). For 3D point cloud transformers, existing efforts focus primarily on …
texts and images). For 3D point cloud transformers, existing efforts focus primarily on …