- Academic Search

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org

Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Enregistrer Citer Cité 134 fois Autres articles Les 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Advances in medical image analysis with vision transformers: a comprehensive review

R Azad, A Kazerouni, M Heidari, EK Aghdam… - Medical Image …, 2024 - Elsevier

The remarkable performance of the Transformer architecture in natural language processing
has recently also triggered broad interest in Computer Vision. Among other merits …

Enregistrer Citer Cité 144 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HY Mark Liao - European conference on computer …, 2024 - Springer

Today's deep learning methods focus on how to design the objective functions to make the
prediction as close as possible to the target. Meanwhile, an appropriate neural network …

Enregistrer Citer Cité 1365 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Segment everything everywhere all at once

X Zou, J Yang, H Zhang, F Li, L Li… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we present SEEM, a promotable and interactive model for segmenting
everything everywhere all at once in an image. In SEEM, we propose a novel and versatile …

Enregistrer Citer Cité 523 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Enregistrer Citer Cité 1007 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Enregistrer Citer Cité 791 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Dual aggregation transformer for image super-resolution

Z Chen, Y Zhang, J Gu, L Kong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Transformer has recently gained considerable popularity in low-level vision tasks, including
image super-resolution (SR). These networks utilize self-attention along different …

Enregistrer Citer Cité 202 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] thecvf.com

Generalized decoding for pixel, image, and language

X Zou, ZY Dou, J Yang, Z Gan, L Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present X-Decoder, a generalized decoding model that can predict pixel-level
segmentation and language tokens seamlessly. X-Decoder takes as input two types of …

Enregistrer Citer Cité 250 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

S Chen, H Li, Q Wang, Z Zhao… - Advances in Neural …, 2023 - proceedings.neurips.cc

Vision and text have been fully explored in contemporary video-text foundational models,
while other modalities such as audio and subtitles in videos have not received sufficient …

Enregistrer Citer Cité 101 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc

Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

Enregistrer Citer Cité 194 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Davit: Dual attention vision transformers

Foundation Models Defining a New Era in Vision: a Survey and Outlook

Advances in medical image analysis with vision transformers: a comprehensive review

Yolov9: Learning what you want to learn using programmable gradient information

Segment everything everywhere all at once

Vision mamba: Efficient visual representation learning with bidirectional state space model

Internimage: Exploring large-scale vision foundation models with deformable convolutions

Dual aggregation transformer for image super-resolution

Generalized decoding for pixel, image, and language

Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset

S4nd: Modeling images and videos as multidimensional signals with state spaces