- Academic Search

AB Amjoud, M Amrouch - IEEE Access, 2023 - ieeexplore.ieee.org

Detecting objects remains one of computer vision and image understanding applications'
most fundamental and challenging aspects. Significant advances in object detection have …

Enregistrer Citer Cité 162 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

A survey of deep learning-based object detection methods and datasets for overhead imagery

J Kang, S Tariq, H Oh, SS Woo - IEEE Access, 2022 - ieeexplore.ieee.org

Significant advancements and progress made in recent computer vision research enable
more effective processing of various objects in high-resolution overhead imagery obtained …

Enregistrer Citer Cité 75 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Enregistrer Citer Cité 1580 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

Enregistrer Citer Cité 449 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Enregistrer Citer Cité 697 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Enregistrer Citer Cité 791 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Universal instance perception as object discovery and retrieval

B Yan, Y Jiang, J Wu, D Wang, P Luo… - Proceedings of the …, 2023 - openaccess.thecvf.com

All instance perception tasks aim at finding certain objects specified by some queries such
as category names, language expressions, and target annotations, but this complete field …

Enregistrer Citer Cité 165 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Real-time object detection is one of the most important research topics in computer vision.
As new approaches regarding architecture optimization and training optimization are …

Enregistrer Citer Cité 9453 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Enregistrer Citer Cité 212 fois Autres articles Les 6 versions Free GPT-4 Recherche dans les bibliothèques Version HTML

[Free GPT-4]

[PDF] arxiv.org

Exploring plain vision transformer backbones for object detection

Y Li, H Mao, R Girshick, K He - European conference on computer vision, 2022 - Springer

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …

Enregistrer Citer Cité 910 fois Autres articles Les 6 versions Free GPT-4

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Dynamic head: Unifying object detection heads with attentions

Object detection using deep learning, CNNs and vision transformers: A review

A survey of deep learning-based object detection methods and datasets for overhead imagery

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Image as a foreign language: Beit pretraining for vision and vision-language tasks

Eva: Exploring the limits of masked visual representation learning at scale

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Universal instance perception as object discovery and retrieval

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Multimodal foundation models: From specialists to general-purpose assistants

Exploring plain vision transformer backbones for object detection