محقق Google

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arxiv preprint arxiv:2403.14608, 2024‏ - arxiv.org‏

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …‏

ذخیره ارجاع بیان شده در 264 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends‏

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022‏ - nowpublishers.com‏

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …‏

ذخیره ارجاع بیان شده در 198 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Grounding dino: Marrying dino with grounded pre-training for open-set object detection‏

S Liu, Z Zeng, T Ren, F Li, H Zhang, J Yang… - … on Computer Vision, 2024‏ - Springer‏

In this paper, we develop an open-set object detector, called Grounding DINO, by marrying
Transformer-based detector DINO with grounded pre-training, which can detect arbitrary …‏

ذخیره ارجاع بیان شده در 1667 یافته مقاله‌های مربوط تمام نسخه‌های 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention‏

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …‏

ذخیره ارجاع بیان شده در 737 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Repurposing diffusion-based image generators for monocular depth estimation‏

B Ke, A Obukhov, S Huang, N Metzger… - Proceedings of the …, 2024‏ - openaccess.thecvf.com‏

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth
from a single image is geometrically ill-posed and requires scene understanding so it is not …‏

ذخیره ارجاع بیان شده در 256 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision-language models for vision tasks: A survey‏

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024‏ - ieeexplore.ieee.org‏

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …‏

ذخیره ارجاع بیان شده در 462 یافته مقاله‌های مربوط تمام نسخه‌های 11

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Adding conditional control to text-to-image diffusion models‏

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023‏ - openaccess.thecvf.com‏

We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …‏

ذخیره ارجاع بیان شده در 3716 یافته مقاله‌های مربوط تمام نسخه‌های 7 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] nih.gov

A visual-language foundation model for computational pathology‏

MY Lu, B Chen, DFK Williamson, RJ Chen, I Liang… - Nature Medicine, 2024‏ - nature.com‏

The accelerated adoption of digital pathology and advances in deep learning have enabled
the development of robust models for various pathology tasks across a diverse array of …‏

ذخیره ارجاع بیان شده در 253 یافته مقاله‌های مربوط تمام نسخه‌های 9

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multi-concept customization of text-to-image diffusion‏

N Kumari, B Zhang, R Zhang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

While generative models produce high-quality images of concepts learned from a large-
scale database, a user often wishes to synthesize instantiations of their own concepts (for …‏

ذخیره ارجاع بیان شده در 728 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Maple: Multi-modal prompt learning‏

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …‏

ذخیره ارجاع بیان شده در 700 یافته مقاله‌های مربوط تمام نسخه‌های 9 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Clip-adapter: Better vision-language models with feature adapters

Parameter-efficient fine-tuning for large models: A comprehensive survey‏

Vision-language pre-training: Basics, recent advances, and future trends‏

Grounding dino: Marrying dino with grounded pre-training for open-set object detection‏

Llama-adapter: Efficient fine-tuning of language models with zero-init attention‏

Repurposing diffusion-based image generators for monocular depth estimation‏

Vision-language models for vision tasks: A survey‏

Adding conditional control to text-to-image diffusion models‏

A visual-language foundation model for computational pathology‏

Multi-concept customization of text-to-image diffusion‏

Maple: Multi-modal prompt learning‏