Yolov10: Real-time end-to-end object detection

A Wang, H Chen, L Liu, K Chen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-
time object detection owing to their effective balance between computational cost and …

Repvit: Revisiting mobile cnn from vit perspective

A Wang, H Chen, Z Lin, J Han… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Recently lightweight Vision Transformers (ViTs) demonstrate superior performance
and lower latency compared with lightweight Convolutional Neural Networks (CNNs) on …

Grounded sam: Assembling open-world models for diverse visual tasks

T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Detrs with collaborative hybrid assignments training

Z Zong, G Song, Y Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …

Dense distinct query for end-to-end object detection

S Zhang, X Wang, J Wang, J Pang… - Proceedings of the …, 2023 - openaccess.thecvf.com
One-to-one label assignment in object detection has successfully obviated the need of non-
maximum suppression (NMS) as a postprocessing and makes the pipeline end-to-end …

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

X Wu, F Zhu, R Zhao, H Li - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects
from novel categories beyond the base categories on which the detector is trained. Recent …

Sparsebev: High-performance sparse 3d object detection from multi-camera videos

H Liu, Y Teng, T Lu, H Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great
attention over the past few years. Dense detectors typically follow a two-stage pipeline by …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Maptrv2: An end-to-end framework for online vectorized hd map construction

B Liao, S Chen, Y Zhang, B Jiang, Q Zhang… - International Journal of …, 2024 - Springer
High-definition (HD) map provides abundant and precise static environmental information of
the driving scene, serving as a fundamental and indispensable component for planning in …