Google Tudós

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

J Li, J Chen, Y Tang, C Wang, BA Landman… - Medical image …, 2023 - Elsevier

Transformer, one of the latest technological advances of deep learning, has gained
prevalence in natural language processing or computer vision. Since medical imaging bear …

Mentés Hivatkozás Idézetek száma: 214 Kapcsolódó cikkek Mind a(z) 9 változat

[Free GPT-4]

[PDF] arxiv.org

Transformers in vision: A survey

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Mentés Hivatkozás Idézetek száma: 2920 Kapcsolódó cikkek Mind a(z) 8 változat

[Free GPT-4]

[PDF] thecvf.com

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Mentés Hivatkozás Idézetek száma: 682 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Mentés Hivatkozás Idézetek száma: 381 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - European Conference on …, 2024 - Springer

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Mentés Hivatkozás Idézetek száma: 147 Kapcsolódó cikkek Mind a(z) 2 változat

[Free GPT-4]

[PDF] thecvf.com

Humans in 4D: Reconstructing and tracking humans with transformers

S Goel, G Pavlakos, J Rajasegaran… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present an approach to reconstruct humans and track them over time. At the core of our
approach, we propose a fully" transformerized" version of a network for human mesh …

Mentés Hivatkozás Idézetek száma: 172 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]

[PDF] thecvf.com

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning

RJ Chen, C Chen, Y Li, TY Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Vision Transformers (ViTs) and their multi-scale and hierarchical variations have
been successful at capturing image representations but their use has been generally …

Mentés Hivatkozás Idézetek száma: 484 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]

[PDF] neurips.cc

Adaptformer: Adapting vision transformers for scalable visual recognition

S Chen, C Ge, Z Tong, J Wang… - Advances in …, 2022 - proceedings.neurips.cc

Abstract Pretraining Vision Transformers (ViTs) has achieved great success in visual
recognition. A following scenario is to adapt a ViT to various image and video recognition …

Mentés Hivatkozás Idézetek száma: 608 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]

[PDF] neurips.cc

Masked autoencoders as spatiotemporal learners

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022 - proceedings.neurips.cc

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …

Mentés Hivatkozás Idézetek száma: 556 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]

[PDF] arxiv.org

Exploring plain vision transformer backbones for object detection

Y Li, H Mao, R Girshick, K He - European conference on computer vision, 2022 - Springer

We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for
object detection. This design enables the original ViT architecture to be fine-tuned for object …

Mentés Hivatkozás Idézetek száma: 915 Kapcsolódó cikkek Mind a(z) 6 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Multiscale vision transformers

Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives

Transformers in vision: A survey

Convnext v2: Co-designing and scaling convnets with masked autoencoders

Videomae v2: Scaling video masked autoencoders with dual masking

Videomamba: State space model for efficient video understanding

Humans in 4D: Reconstructing and tracking humans with transformers

Scaling vision transformers to gigapixel images via hierarchical self-supervised learning

Adaptformer: Adapting vision transformers for scalable visual recognition

Masked autoencoders as spatiotemporal learners

Exploring plain vision transformer backbones for object detection