Lidar-llm: Exploring the potential of large language models for 3d lidar understanding

S Yang, J Liu, R Zhang, M Pan, Z Guo, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, Large Language Models (LLMs) and Multimodal Large Language Models
(MLLMs) have shown promise in instruction following and 2D image understanding. While …

Edgesam: Prompt-in-the-loop distillation for on-device deployment of sam

C Zhou, X Li, CC Loy, B Dai - arxiv preprint arxiv:2312.06660, 2023 - arxiv.org
This paper presents EdgeSAM, an accelerated variant of the Segment Anything Model
(SAM), optimized for efficient execution on edge devices with minimal compromise in …

Stream Query Denoising for Vectorized HD-Map Construction

S Wang, F Jia, W Mao, Y Liu, Y Zhao, Z Chen… - … on Computer Vision, 2024 - Springer
This paper introduces the Stream Query Denoising (SQD) strategy, a novel and general
approach for high-definition map (HD-map) construction. SQD is designed to improve the …

SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion

M Dai, L Yang, Y Xu, Z Feng… - Advances in Neural …, 2025 - proceedings.neurips.cc
Visual grounding is a common vision task that involves grounding descriptive sentences to
the corresponding regions of an image. Most existing methods use independent image-text …

Empowering lightweight detectors: Orientation Distillation via anti-ambiguous spatial transformation for remote sensing images

Y Zhang, W Zhang, J Li, X Qi, X Lu, L Wang… - ISPRS Journal of …, 2024 - Elsevier
Abstract Knowledge distillation (KD) has been one of the most potential methods to
implement a lightweight detector, which plays a significant role in satellite in-orbit processing …

KD-DETR: Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

Y Wang, X Li, S Weng, G Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
DETR is a novel end-to-end transformer architecture object detector which significantly
outperforms classic detectors when scaling up. In this paper we focus on the compression of …

[HTML][HTML] Computer vision model compression techniques for embedded systems: A survey

A Lopes, FP dos Santos, D de Oliveira, M Schiezaro… - Computers & …, 2024 - Elsevier
Deep neural networks have consistently represented the state of the art in most computer
vision problems. In these scenarios, larger and more complex models have demonstrated …

D-FINE: redefine regression Task in DETRs as Fine-grained distribution refinement

Y Peng, H Li, P Wu, Y Zhang, X Sun, F Wu - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce D-FINE, a powerful real-time object detector that achieves outstanding
localization precision by redefining the bounding box regression task in DETR models. D …

Distilling Knowledge from Large-Scale Image Models for Object Detection

G Li, W Wang, X Li, Z Li, J Yang, J Dai, Y Qiao… - … on Computer Vision, 2024 - Springer
Large-scale image models have made great progress in recent years, pushing the
boundaries of many vision tasks, eg, object detection. Considering that deploying large …

DHS-DETR: Efficient DETRs with dynamic head switching

H Chen, C Tang, X Hu - Computer Vision and Image Understanding, 2024 - Elsevier
Detection Transformer (DETR) and its variants have emerged a new paradigm to object
detection, but their high computational cost hinders practical applications. By investigating …