Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier
The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe

H Li, C Sima, J Dai, W Wang, L Lu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer
In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Detrs beat yolos on real-time object detection

Y Zhao, W Lv, S Xu, J Wei, G Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The YOLO series has become the most popular framework for real-time object detection due
to its reasonable trade-off between speed and accuracy. However we observe that the …

Yolov10: Real-time end-to-end object detection

A Wang, H Chen, L Liu, K Chen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Over the past years, YOLOs have emerged as the predominant paradigm in the field of real-
time object detection owing to their effective balance between computational cost and …

Segment anything in high quality

L Ke, M Ye, M Danelljan, YW Tai… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract The recent Segment Anything Model (SAM) represents a big leap in scaling up
segmentation models, allowing for powerful zero-shot capabilities and flexible prompting …

Grounded sam: Assembling open-world models for diverse visual tasks

T Ren, S Liu, A Zeng, J Lin, K Li, H Cao, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to
combine with the segment anything model (SAM). This integration enables the detection and …

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Exploring object-centric temporal modeling for efficient multi-view 3d object detection

S Wang, Y Liu, T Wang, Y Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we propose a long-sequence modeling framework, named StreamPETR, for
multi-view 3D object detection. Built upon the sparse query design in the PETR series, we …

Detrs with collaborative hybrid assignments training

Z Zong, G Song, Y Liu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we provide the observation that too few queries assigned as positive samples
in DETR with one-to-one set matching leads to sparse supervision on the encoder's output …