Študovňa Google

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier

Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

Uložiť Citovať Citované 223-krát Súvisiace články Všetky verzie 8

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org

The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Uložiť Citovať Citované 156-krát Súvisiace články Všetky verzie 4

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Uložiť Citovať Citované 389-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Uložiť Citovať Citované 160-krát Súvisiace články Všetky verzie 7

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Uložiť Citovať Citované 932-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Humans in 4D: Reconstructing and tracking humans with transformers

S Goel, G Pavlakos, J Rajasegaran… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present an approach to reconstruct humans and track them over time. At the core of our
approach, we propose a fully" transformerized" version of a network for human mesh …

Uložiť Citovať Citované 178-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Y **ong, B Varadarajan, L Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Segment Anything Model (SAM) has emerged as a powerful tool for numerous
vision applications. A key component that drives the impressive performance for zero-shot …

Uložiť Citovať Citované 128-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Climax: A foundation model for weather and climate

T Nguyen, J Brandstetter, A Kapoor, JK Gupta… - arxiv preprint arxiv …, 2023 - arxiv.org

Most state-of-the-art approaches for weather and climate modeling are based on physics-
informed numerical models of the atmosphere. These approaches aim to model the non …

Uložiť Citovať Citované 292-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Hiera: A hierarchical vision transformer without the bells-and-whistles

C Ryali, YT Hu, D Bolya, C Wei, H Fan… - International …, 2023 - proceedings.mlr.press

Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …

Uložiť Citovať Citované 145-krát Súvisiace články Všetky verzie 7 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Ma-lmm: Memory-augmented large multimodal model for long-term video understanding

B He, H Li, YK Jang, M Jia, X Cao… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the success of large language models (LLMs) integrating the vision model into LLMs to
build vision-language foundation models has gained much more interest recently. However …

Uložiť Citovať Citované 68-krát Súvisiace články Všetky verzie 6 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Multiscale vision transformers

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

Self-supervised learning for videos: A survey

Videomae v2: Scaling video masked autoencoders with dual masking

Videomamba: State space model for efficient video understanding

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Humans in 4D: Reconstructing and tracking humans with transformers

Efficientsam: Leveraged masked image pretraining for efficient segment anything

Climax: A foundation model for weather and climate

Hiera: A hierarchical vision transformer without the bells-and-whistles

Ma-lmm: Memory-augmented large multimodal model for long-term video understanding