Delving into the devils of bird's-eye-view perception: A review, evaluation and recipe
Learning powerful representations in bird's-eye-view (BEV) for perception tasks is trending
and drawing extensive attention both from industry and academia. Conventional …
and drawing extensive attention both from industry and academia. Conventional …
[PDF][PDF] YOLOv1 to YOLOv10: The fastest and most accurate real-time object detection systems
This is a comprehensive review of the YOLO series of systems. Different from previous
literature surveys, this review article reexamines the characteristics of the YOLO series from …
literature surveys, this review article reexamines the characteristics of the YOLO series from …
Convolutions die hard: Open-vocabulary segmentation with single frozen convolutional clip
Open-vocabulary segmentation is a challenging task requiring segmenting and recognizing
objects from an open set of categories in diverse environments. One way to address this …
objects from an open set of categories in diverse environments. One way to address this …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
Cut and learn for unsupervised object detection and instance segmentation
Abstract We propose Cut-and-LEaRn (CutLER), a simple approach for training
unsupervised object detection and segmentation models. We leverage the property of self …
unsupervised object detection and segmentation models. We leverage the property of self …
Images speak in images: A generalist painter for in-context visual learning
In-context learning, as a new paradigm in NLP, allows the model to rapidly adapt to various
tasks with only a handful of prompts and examples. But in computer vision, the difficulties for …
tasks with only a handful of prompts and examples. But in computer vision, the difficulties for …
Seggpt: Segmenting everything in context
We present SegGPT, a generalist model for segmenting everything in context. We unify
various segmentation tasks into a generalist in-context learning framework that …
various segmentation tasks into a generalist in-context learning framework that …
Masked-attention mask transformer for universal image segmentation
Image segmentation groups pixels with different semantics, eg, category or instance
membership. Each choice of semantics defines a task. While only the semantics of each task …
membership. Each choice of semantics defines a task. While only the semantics of each task …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
Conditional detr for fast training convergence
The recently-developed DETR approach applies the transformer encoder and decoder
architecture to object detection and achieves promising performance. In this paper, we …
architecture to object detection and achieves promising performance. In this paper, we …