- Academic Search

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

J Terven, DM Córdova-Esparza… - Machine Learning and …, 2023 - mdpi.com

YOLO has become a central real-time object detection system for robotics, driverless cars,
and video monitoring applications. We present a comprehensive analysis of YOLO's …

Spara Citera Citerat av 1939 Relaterade artiklar Alla 6 versionerna Cachad

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A survey on evaluation of large language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

Spara Citera Citerat av 2134 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Spara Citera Citerat av 8370 Relaterade artiklar Alla 12 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

Spara Citera Citerat av 5071 Relaterade artiklar Alla 15 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024 - Springer

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …

Spara Citera Citerat av 1611 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Spara Citera Citerat av 1772 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] researchhub.com

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

J Bai, S Bai, S Yang, S Wang… - arxiv preprint …, 2023 - storage.prod.researchhub.com

In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …

Spara Citera Citerat av 552 Relaterade artiklar Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

Spara Citera Citerat av 533 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

Spara Citera Citerat av 1195 Relaterade artiklar Alla 10 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sharegpt4v: Improving large multi-modal models with better captions

L Chen, J Li, X Dong, P Zhang, C He, J Wang… - … on Computer Vision, 2024 - Springer

Modality alignment serves as the cornerstone for large multi-modal models (LMMs).
However, the impact of different attributes (eg, data type, quality, and scale) of training data …

Spara Citera Citerat av 452 Relaterade artiklar Alla 3 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Microsoft coco: Common objects in context

[HTML][HTML] A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas

A survey on evaluation of large language models

Segment anything

Visual instruction tuning

Grounding dino: Marrying dino with grounded pre-training for open-set object detection

Improved baselines with visual instruction tuning

[PDF][PDF] Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond

Segment everything everywhere all at once

Run, don't walk: chasing higher FLOPS for faster neural networks

Sharegpt4v: Improving large multi-modal models with better captions