الباحث العلمي من Google

AA Aleissaee, A Kumar, RM Anwer, S Khan… - Remote Sensing, 2023‏ - mdpi.com‏

Deep learning-based algorithms have seen a massive popularity in different areas of remote
sensing image analysis over the past decade. Recently, transformer-based architectures …‏

حفظ اقتباس تم اقتباسها في عدد: 194 مقالات ذات صلة الإصدارات الـ 7كلها نسخة مخزَّنة مؤقتًا

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Deep learning based computer vision approaches for smart agricultural applications‏

VG Dhanya, A Subeesh, NL Kushwaha… - Artificial Intelligence in …, 2022‏ - Elsevier‏

The agriculture industry is undergoing a rapid digital transformation and is growing powerful
by the pillars of cutting-edge approaches like artificial intelligence and allied technologies …‏

حفظ اقتباس تم اقتباسها في عدد: 197 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks‏

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …‏

حفظ اقتباس تم اقتباسها في عدد: 452 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models‏

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …‏

حفظ اقتباس تم اقتباسها في عدد: 433 مقالات ذات صلة الإصدارات الـ 6كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites‏

Z Chen, W Wang, H Tian, S Ye, Z Gao, E Cui… - Science China …, 2024‏ - Springer‏

In this paper, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …‏

حفظ اقتباس تم اقتباسها في عدد: 359 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking‏

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …‏

حفظ اقتباس تم اقتباسها في عدد: 384 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding‏

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024‏ - Springer‏

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …‏

حفظ اقتباس تم اقتباسها في عدد: 149 مقالات ذات صلة الإصدارات الـ 2كلها

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale‏

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023‏ - openaccess.thecvf.com‏

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …‏

حفظ اقتباس تم اقتباسها في عدد: 709 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis‏

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024‏ - Elsevier‏

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …‏

حفظ اقتباس تم اقتباسها في عدد: 229 مقالات ذات صلة الإصدارات الـ 3كلها

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Masked autoencoders as spatiotemporal learners‏

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022‏ - proceedings.neurips.cc‏

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …‏

حفظ اقتباس تم اقتباسها في عدد: 560 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Mvitv2: Improved multiscale vision transformers for classification and detection

Transformers in remote sensing: A survey‏

[HTML][HTML] Deep learning based computer vision approaches for smart agricultural applications‏

Image as a foreign language: Beit pretraining for vision and vision-language tasks‏

Open-vocabulary panoptic segmentation with text-to-image diffusion models‏

How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites‏

Videomae v2: Scaling video masked autoencoders with dual masking‏

Videomamba: State space model for efficient video understanding‏

Eva: Exploring the limits of masked visual representation learning at scale‏

Eva-02: A visual representation for neon genesis‏

Masked autoencoders as spatiotemporal learners‏