Vision-centric bev perception: A survey

Y Ma, T Wang, X Bai, H Yang, Y Hou… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …

Grounded sam: Assembling open-world models for diverse visual tasks

T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …

Taptr: Tracking any point with transformers as detection

H Li, H Zhang, S Liu, Z Zeng, T Ren, F Li… - European Conference on …, 2024 - Springer
In this paper, we propose a simple yet effective approach for Tracking Any Point with
TRansformers (TAPTR). Based on the observation that point tracking bears a great …

Open: Object-wise position embedding for multi-view 3d object detection

J Hou, T Wang, X Ye, Z Liu, S Gong, X Tan… - … on Computer Vision, 2024 - Springer
Accurate depth information is crucial for enhancing the performance of multi-view 3D object
detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise …

Context and geometry aware voxel transformer for semantic scene completion

Z Yu, R Zhang, J Ying, J Yu, X Hu, L Luo… - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-based Semantic Scene Completion (SSC) has gained much attention due to its
widespread applications in various 3D perception tasks. Existing sparse-to-dense …

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

X Liu, C Zheng, M Qian, N Xue… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Multi-View Attentive Contextualization (MvACon) a simple yet effective
method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object …

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

LH Chen, S Lu, A Zeng, H Zhang, B Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
This study delves into the realm of multi-modality (ie, video and motion modalities) human
behavior understanding by leveraging the powerful capabilities of Large Language Models …

Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors

H Ji, P Liang, E Cheng - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Multi-camera-based 3D object detection has made notable progress in the past several
years. However we observe that there are cases (eg faraway regions) in which popular 2D …

LinkOcc: 3D Semantic Occupancy Prediction with Temporal Association

W Ouyang, Z Xu, B Shen, J Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
3D semantic occupancy has garnered considerable attention due to its abundant structural
information encompassing the entire autonomous driving scene. However, existing 3D …

CoreNet: Conflict Resolution Network for point-pixel misalignment and sub-task suppression of 3D LiDAR-camera object detection

Y Li, Y Yang, Z Lei - Information Fusion, 2025 - Elsevier
Fusing multi-modality inputs from different sensors is an effective way to improve the
performance of 3D object detection. However, current methods overlook two important …