- Academic Search

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

Simpan Kutip Dirujuk 1930 kali Artikel terkait 6 versi Cache

[Free GPT-4]

[PDF] arxiv.org

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Simpan Kutip Dirujuk 136 kali Artikel terkait 2 versi

[Free GPT-4]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Simpan Kutip Dirujuk 1596 kali Artikel terkait 4 versi

[Free GPT-4]

[PDF] arxiv.org

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …

Simpan Kutip Dirujuk 554 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Glipv2: Unifying localization and vision-language understanding

H Zhang, P Zhang, X Hu, YC Chen… - Advances in …, 2022 - proceedings.neurips.cc

We present GLIPv2, a grounded VL understanding model, that serves both localization tasks
(eg, object detection, instance segmentation) and Vision-Language (VL) understanding …

Simpan Kutip Dirujuk 314 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Grounded language-image pre-training

LH Li, P Zhang, H Zhang, J Yang, C Li… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents a grounded language-image pre-training (GLIP) model for learning
object-level, language-aware, and semantic-rich visual representations. GLIP unifies object …

Simpan Kutip Dirujuk 1158 kali Artikel terkait 8 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Vector quantized diffusion model for text-to-image synthesis

S Gu, D Chen, J Bao, F Wen, B Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation.
This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent …

Simpan Kutip Dirujuk 858 kali Artikel terkait 10 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Vim: Out-of-distribution with virtual-logit matching

H Wang, Z Li, L Feng, W Zhang - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Most of the existing Out-Of-Distribution (OOD) detection algorithms depend on single input
source: the feature, the logit, or the softmax probability. However, the immense diversity of …

Simpan Kutip Dirujuk 339 kali Artikel terkait 5 versi Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

S Changpinyo, P Sharma, N Ding… - Proceedings of the …, 2021 - openaccess.thecvf.com

The availability of large-scale image captioning and visual question answering datasets has
contributed significantly to recent successes in vision-and-language pre-training. However …

Simpan Kutip Dirujuk 1078 kali Artikel terkait 9 versi Versi HTML

[Free GPT-4]

[PDF] neurips.cc

Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection

L Yao, J Han, Y Wen, X Liang, D Xu… - Advances in …, 2022 - proceedings.neurips.cc

Open-world object detection, as a more general and challenging goal, aims to recognize
and localize objects described by arbitrary category names. The recent work GLIP …

Simpan Kutip Dirujuk 153 kali Artikel terkait 5 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Openimages: A public dataset for large-scale multi-label and multi-class image classification

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Cogvlm: Visual expert for pretrained language models

Glipv2: Unifying localization and vision-language understanding

Grounded language-image pre-training

Vector quantized diffusion model for text-to-image synthesis

Vim: Out-of-distribution with virtual-logit matching

Conceptual 12m: Pushing web-scale image-text pre-training to recognize long-tail visual concepts

Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection