Learn to be efficient: Build structured sparsity in large language models

H Zheng, X Bai, X Liu, ZM Mao… - Advances in …, 2025 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) have achieved remarkable success with their
billion-level parameters, yet they incur high inference overheads. The emergence of …

Vpa: Fully test-time visual prompt adaptation

J Sun, M Ibrahim, M Hall, I Evtimov, ZM Mao… - Proceedings of the 31st …, 2023 - dl.acm.org
Textual prompt tuning has demonstrated significant performance improvements in adapting
natural language processing models to a variety of downstream tasks by treating hand …

Cohere3d: Exploiting temporal coherence for unsupervised representation learning of vision-based autonomous driving

Y **e, H Chen, GP Meyer, YJ Lee, EM Wolff… - arxiv preprint arxiv …, 2024 - arxiv.org
Due to the lack of depth cues in images, multi-frame inputs are important for the success of
vision-based perception, prediction, and planning in autonomous driving. Observations from …

Panoptic perception for autonomous driving: A survey

Y Li, L Xu - arxiv preprint arxiv:2408.15388, 2024 - arxiv.org
Panoptic perception represents a forefront advancement in autonomous driving technology,
unifying multiple perception tasks into a singular, cohesive framework to facilitate a thorough …

S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving

MK Wozniak, H Govindarajan, M Klingner… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent self-supervised clustering-based pre-training techniques like DINO and Cribo have
shown impressive results for downstream detection and segmentation tasks. However, real …

[PDF][PDF] Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection

M Khurana, N Peri, D Ramanan… - arxiv preprint arxiv …, 2024 - neeharperi.com
State-of-the-art 3D object detectors are often trained on massive labeled datasets. However,
annotating 3D bounding boxes remains prohibitively expensive and time-consuming …

Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection

M Khurana, N Peri, J Hays, D Ramanan - arxiv preprint arxiv:2406.10115, 2024 - arxiv.org
State-of-the-art 3D object detectors are often trained on massive labeled datasets. However,
annotating 3D bounding boxes remains prohibitively expensive and time-consuming …

CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning

R Chen, H Zhang, A Ravichandran, W Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
Unsupervised 3D representation learning via masked-and-reconstruction with differentiable
rendering is promising to reduce the labeling burden for fusion 3D perception. However …

Learning Shared RGB-D Fields: Unified Self-supervised Pre-training for Label-efficient LiDAR-Camera 3D Perception

X Xu, Y Li, T Zhang, J Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Constructing large-scale labeled datasets for multi-modal perception model training in
autonomous driving presents significant challenges. This has motivated the development of …

Finetuning Pre-trained Model with Limited Data for LiDAR-based 3D Object Detection by Bridging Domain Gaps

J Jang, M Chang, J Park, J Kim - 2024 IEEE/RSJ International …, 2024 - ieeexplore.ieee.org
LiDAR-based 3D object detectors have been largely utilized in various applications,
including autonomous vehicles or mobile robots. However, LiDAR-based detectors often fail …