„Google“ mokslinčius

S Khan, M Naseer, M Hayat, SW Zamir… - ACM computing …, 2022 - dl.acm.org

Astounding results from Transformer models on natural language tasks have intrigued the
vision community to study their application to computer vision problems. Among their salient …

Išsaugoti Cituoti Cituoja 2980 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Išsaugoti Cituoti Cituoja 653 Susiję straipsniai Visos 18 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction

Y Zhang, Z Zhu, D Du - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

The vision-based perception for autonomous driving has undergone a transformation from
the bird-eye-view (BEV) representations to the 3D semantic occupancy. Compared with the …

Išsaugoti Cituoti Cituoja 171 Susiję straipsniai Visos 5 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Visual prompt tuning

M Jia, L Tang, BC Chen, C Cardie, S Belongie… - European conference on …, 2022 - Springer

The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …

Išsaugoti Cituoti Cituoja 1816 Susiję straipsniai Visos 8 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

Išsaugoti Cituoti Cituoja 429 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

Išsaugoti Cituoti Cituoja 867 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com

State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

Išsaugoti Cituoti Cituoja 741 Susiję straipsniai Visos 8 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Star-transformer: a spatio-temporal cross attention transformer for human action recognition

D Ahn, S Kim, H Hong, BC Ko - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In action recognition, although the combination of spatio-temporal videos and skeleton
features can improve the recognition performance, a separate model and balancing feature …

Išsaugoti Cituoti Cituoja 162 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B **ong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Išsaugoti Cituoti Cituoja 1573 Susiję straipsniai Visos 6 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Vivit: A video vision transformer

A Arnab, M Dehghani, G Heigold… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present pure-transformer based models for video classification, drawing upon the recent
success of such models in image classification. Our model extracts spatio-temporal tokens …

Išsaugoti Cituoti Cituoja 2718 Susiję straipsniai Visos 10 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Video action transformer network

Transformers in vision: A survey

Human action recognition from various data modalities: A review

Occformer: Dual-path transformer for vision-based 3d semantic occupancy prediction

Visual prompt tuning

Actionformer: Localizing moments of actions with transformers

Mvitv2: Improved multiscale vision transformers for classification and detection

Flava: A foundational language and vision alignment model

Star-transformer: a spatio-temporal cross attention transformer for human action recognition

Multiscale vision transformers

Vivit: A video vision transformer