Vision-centric bev perception: A survey
In recent years, vision-centric Bird's Eye View (BEV) perception has garnered significant
interest from both industry and academia due to its inherent advantages, such as providing …
interest from both industry and academia due to its inherent advantages, such as providing …
Grounded sam: Assembling open-world models for diverse visual tasks
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …
combine with the segment anything model (SAM). This integration enables the detection and …
Taptr: Tracking any point with transformers as detection
In this paper, we propose a simple yet effective approach for Tracking Any Point with
TRansformers (TAPTR). Based on the observation that point tracking bears a great …
TRansformers (TAPTR). Based on the observation that point tracking bears a great …
Open: Object-wise position embedding for multi-view 3d object detection
Accurate depth information is crucial for enhancing the performance of multi-view 3D object
detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise …
detection. Despite the success of some existing multi-view 3D detectors utilizing pixel-wise …
Context and geometry aware voxel transformer for semantic scene completion
Vision-based Semantic Scene Completion (SSC) has gained much attention due to its
widespread applications in various 3D perception tasks. Existing sparse-to-dense …
widespread applications in various 3D perception tasks. Existing sparse-to-dense …
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Abstract We present Multi-View Attentive Contextualization (MvACon) a simple yet effective
method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object …
method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object …
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
This study delves into the realm of multi-modality (ie, video and motion modalities) human
behavior understanding by leveraging the powerful capabilities of Large Language Models …
behavior understanding by leveraging the powerful capabilities of Large Language Models …
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
Multi-camera-based 3D object detection has made notable progress in the past several
years. However we observe that there are cases (eg faraway regions) in which popular 2D …
years. However we observe that there are cases (eg faraway regions) in which popular 2D …
LinkOcc: 3D Semantic Occupancy Prediction with Temporal Association
3D semantic occupancy has garnered considerable attention due to its abundant structural
information encompassing the entire autonomous driving scene. However, existing 3D …
information encompassing the entire autonomous driving scene. However, existing 3D …
CoreNet: Conflict Resolution Network for point-pixel misalignment and sub-task suppression of 3D LiDAR-camera object detection
Fusing multi-modality inputs from different sensors is an effective way to improve the
performance of 3D object detection. However, current methods overlook two important …
performance of 3D object detection. However, current methods overlook two important …