Google Академія

Y Li, Y Zhang, R Timofte, L Van Gool… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper reviews the NTIRE 2023 challenge on efficient single-image super-resolution
with a focus on the proposed solutions and results. The aim of this challenge is to devise a …

Зберегти Послатися Цитовано в 145 джерелах Пов’язані статті Кількість версій: 15 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arxiv preprint arxiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Зберегти Послатися Цитовано в 52 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual chatgpt: Talking, drawing and editing with visual foundation models

C Wu, S Yin, W Qi, X Wang, Z Tang, N Duan - arxiv preprint arxiv …, 2023 - arxiv.org

ChatGPT is attracting a cross-field interest as it provides a language interface with
remarkable conversational competency and reasoning capabilities across many domains …

Зберегти Послатися Цитовано в 673 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] The segment anything model (sam) for remote sensing applications: From zero to one shot

LP Osco, Q Wu, EL De Lemos, WN Gonçalves… - International Journal of …, 2023 - Elsevier

Segmentation is an essential step for remote sensing image processing. This study aims to
advance the application of the Segment Anything Model (SAM), an innovative image …

Зберегти Послатися Цитовано в 178 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Expanding language-image pretrained models for general video recognition

B Ni, H Peng, M Chen, S Zhang, G Meng, J Fu… - European conference on …, 2022 - Springer

Contrastive language-image pretraining has shown great success in learning visual-textual
joint representation from web-scale data, demonstrating remarkable “zero-shot” …

Зберегти Послатися Цитовано в 350 джерелах Пов’язані статті Кількість версій: 8

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Fine-tuned clip models are efficient video learners

H Rasheed, MU Khattak, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP
model. Since training on a similar scale for videos is infeasible, recent approaches focus on …

Зберегти Послатися Цитовано в 165 джерелах Пов’язані статті Кількість версій: 9 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Зберегти Послатися Цитовано в 264 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Davit: Dual attention vision transformers

M Ding, B **ao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer

In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

Зберегти Послатися Цитовано в 356 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

R Zhang, Z Guo, P Gao, R Fang… - Advances in neural …, 2022 - proceedings.neurips.cc

Masked Autoencoders (MAE) have shown great potentials in self-supervised pre-training for
language and 2D image transformers. However, it still remains an open question on how to …

Зберегти Послатися Цитовано в 256 джерелах Пов’язані статті Кількість версій: 7 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dilateformer: Multi-scale dilated transformer for visual recognition

J Jiao, YM Tang, KY Lin, Y Gao, AJ Ma… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

As a de facto solution, the vanilla Vision Transformers (ViTs) are encouraged to model long-
range dependencies between arbitrary image patches while the global attended receptive …

Зберегти Послатися Цитовано в 137 джерелах Пов’язані статті Кількість версій: 6

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Uniformer: Unifying convolution and self-attention for visual recognition

NTIRE 2023 challenge on efficient super-resolution: Methods and results

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

Visual chatgpt: Talking, drawing and editing with visual foundation models

[HTML][HTML] The segment anything model (sam) for remote sensing applications: From zero to one shot

Expanding language-image pretrained models for general video recognition

Fine-tuned clip models are efficient video learners

St-adapter: Parameter-efficient image-to-video transfer learning

Davit: Dual attention vision transformers

Point-m2ae: multi-scale masked autoencoders for hierarchical point cloud pre-training

Dilateformer: Multi-scale dilated transformer for visual recognition