Študovňa Google

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Uložiť Citovať Citované 198-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Revolutionizing digital pathology with the power of generative artificial intelligence and foundation models

A Waqas, MM Bui, EF Glassy, I El Naqa… - Laboratory …, 2023 - Elsevier

Digital pathology has transformed the traditional pathology practice of analyzing tissue
under a microscope into a computer vision workflow. Whole-slide imaging allows …

Uložiť Citovať Citované 57-krát Súvisiace články Všetky verzie 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Uložiť Citovať Citované 474-krát Súvisiace články Všetky verzie 11

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

Uložiť Citovať Citované 435-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Uložiť Citovať Citované 233-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Yolo-world: Real-time open-vocabulary object detection

T Cheng, L Song, Y Ge, W Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract The You Only Look Once (YOLO) series of detectors have established themselves
as efficient and practical tools. However their reliance on predefined and trained object …

Uložiť Citovať Citované 243-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Recognize anything: A strong image tagging model

Y Zhang, X Huang, J Ma, Z Li, Z Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present the Recognize Anything Model (RAM): a strong foundation model for
image tagging. RAM makes a substantial step for foundation models in computer vision …

Uložiť Citovať Citované 203-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

Uložiť Citovať Citované 166-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Uložiť Citovať Citované 718-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching

X Wu, F Zhu, R Zhao, H Li - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com

Open-vocabulary detection (OVD) is an object detection task aiming at detecting objects
from novel categories beyond the base categories on which the detector is trained. Recent …

Uložiť Citovať Citované 133-krát Súvisiace články Všetky verzie 5 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Open-vocabulary detr with conditional matching

Vision-language pre-training: Basics, recent advances, and future trends

Revolutionizing digital pathology with the power of generative artificial intelligence and foundation models

Vision-language models for vision tasks: A survey

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Multimodal foundation models: From specialists to general-purpose assistants

Yolo-world: Real-time open-vocabulary object detection

Recognize anything: A strong image tagging model

Self-regulating prompts: Foundational model adaptation without forgetting

Maple: Multi-modal prompt learning

Cora: Adapting clip for open-vocabulary detection with region prompting and anchor pre-matching