- Academic Search

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

Uložit Citovat Počet citací tohoto článku: 1951 Související články Všechny verze (počet: 6) Archiv

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Uložit Citovat Počet citací tohoto článku: 138 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Uložit Citovat Počet citací tohoto článku: 1615 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

Uložit Citovat Počet citací tohoto článku: 452 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024 - Springer

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …

Uložit Citovat Počet citací tohoto článku: 359 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Detrs beat yolos on real-time object detection

Y Zhao, W Lv, S Xu, J Wei, G Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The YOLO series has become the most popular framework for real-time object detection due
to its reasonable trade-off between speed and accuracy. However we observe that the …

Uložit Citovat Počet citací tohoto článku: 1099 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Uložit Citovat Počet citací tohoto článku: 590 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Uložit Citovat Počet citací tohoto článku: 709 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Uložit Citovat Počet citací tohoto článku: 813 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scaling open-vocabulary object detection

M Minderer, A Gritsenko… - Advances in Neural …, 2023 - proceedings.neurips.cc

Open-vocabulary object detection has benefited greatly from pretrained vision-language
models, but is still limited by the amount of available detection training data. While detection …

Uložit Citovat Počet citací tohoto článku: 158 Související články Všechny verze (počet: 6) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Objects365: A large-scale, high-quality dataset for object detection

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Image as a foreign language: Beit pretraining for vision and vision-language tasks

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites

Detrs beat yolos on real-time object detection

Depth anything: Unleashing the power of large-scale unlabeled data

Eva: Exploring the limits of masked visual representation learning at scale

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Scaling open-vocabulary object detection