Object detection using deep learning, CNNs and vision transformers: A review
Detecting objects remains one of computer vision and image understanding applications'
most fundamental and challenging aspects. Significant advances in object detection have …
most fundamental and challenging aspects. Significant advances in object detection have …
A survey of deep learning-based object detection methods and datasets for overhead imagery
Significant advancements and progress made in recent computer vision research enable
more effective processing of various objects in high-resolution overhead imagery obtained …
more effective processing of various objects in high-resolution overhead imagery obtained …
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …
Image as a foreign language: Beit pretraining for vision and vision-language tasks
A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …
Eva: Exploring the limits of masked visual representation learning at scale
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …
Internimage: Exploring large-scale vision foundation models with deformable convolutions
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …
large-scale models based on convolutional neural networks (CNNs) are still in an early …
Universal instance perception as object discovery and retrieval
All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …
as category names, language expressions, and target annotations, but this complete field …
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …
As new approaches regarding architecture optimization and training optimization are …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Exploring plain vision transformer backbones for object detection
We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …
object detection. This design enables the original ViT architecture to be fine-tuned for object …