- Academic Search

F Shamshad, S Khan, SW Zamir, MH Khan… - Medical Image …, 2023 - Elsevier

Following unprecedented success on the natural language tasks, Transformers have been
successfully applied to several computer vision problems, achieving state-of-the-art results …

Speichern Zitieren Zitiert von: 756 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]

[PDF] springer.com

Perceptual video quality assessment: A survey

X Min, H Duan, W Sun, Y Zhu, G Zhai - Science China Information …, 2024 - Springer

Perceptual video quality assessment plays a vital role in the field of video processing due to
the existence of quality degradations introduced in various stages of video signal …

Speichern Zitieren Zitiert von: 79 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] arxiv.org

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HY Mark Liao - European conference on computer …, 2024 - Springer

Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …

Speichern Zitieren Zitiert von: 1365 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] thecvf.com

Run, don't walk: chasing higher FLOPS for faster neural networks

J Chen, S Kao, H He, W Zhuo, S Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com

To design fast neural networks, many works have been focusing on reducing the number of
floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …

Speichern Zitieren Zitiert von: 1170 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Open-vocabulary panoptic segmentation with text-to-image diffusion models

J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …

Speichern Zitieren Zitiert von: 424 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Speichern Zitieren Zitiert von: 579 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Speichern Zitieren Zitiert von: 677 Ähnliche Artikel Alle 8 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Speichern Zitieren Zitiert von: 145 Ähnliche Artikel Alle 2 Versionen

[Free GPT-4]

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Speichern Zitieren Zitiert von: 1007 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Speichern Zitieren Zitiert von: 697 Ähnliche Artikel Alle 5 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

A convnet for the 2020s

Transformers in medical imaging: A survey

Perceptual video quality assessment: A survey

Yolov9: Learning what you want to learn using programmable gradient information

Run, don't walk: chasing higher FLOPS for faster neural networks

Open-vocabulary panoptic segmentation with text-to-image diffusion models

Depth anything: Unleashing the power of large-scale unlabeled data

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Videomamba: State space model for efficient video understanding

Vision mamba: Efficient visual representation learning with bidirectional state space model

Eva: Exploring the limits of masked visual representation learning at scale