Google Академик

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Сачувај Цитирај 653 пута наведен Сродни чланци Све верзије (18)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vision transformers for action recognition: A survey

A Ulhaq, N Akhtar, G Pogrebna, A Mian - ar** and recognition are important components of visual scene understanding, eg, for
object detection and semantic segmentation. With end-to-end deep learning systems …

Сачувај Цитирај 564 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Frozen clip models are efficient video learners

Z Lin, S Geng, R Zhang, P Gao, G De Melo… - … on Computer Vision, 2022 - Springer

Video recognition has been dominated by the end-to-end learning paradigm–first initializing
a video recognition model with weights of a pretrained image model and then conducting …

Сачувај Цитирај 240 пута наведен Сродни чланци Све верзије (7)

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S **e, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training
of video models. Our approach first randomly masks out a portion of the input sequence and …

Сачувај Цитирај 741 пута наведен Сродни чланци Све верзије (6) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Florence: A new foundation model for computer vision

L Yuan, D Chen, YL Chen, N Codella, X Dai… - arxiv preprint arxiv …, 2021 - arxiv.org

Automated visual understanding of our diverse and open world demands computer vision
models to generalize well with minimal customization for specific tasks, similar to human …

Сачувај Цитирај 958 пута наведен Сродни чланци Све верзије (2) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Multiview transformers for video recognition

S Yan, X **ong, A Arnab, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Video understanding requires reasoning at multiple spatiotemporal resolutions--from short
fine-grained motions to events taking place over longer durations. Although transformer …

Сачувај Цитирај 335 пута наведен Сродни чланци Све верзије (8) HTML верзија

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open-world object manipulation using pre-trained vision-language models

A Stone, T **ao, Y Lu, K Gopalakrishnan… - arxiv preprint arxiv …, 2023 - arxiv.org

For robots to follow instructions from people, they must be able to connect the rich semantic
information in human vocabulary, eg" can you get me the pink stuffed whale?" to their …

Сачувај Цитирај 142 пута наведен Сродни чланци Све верзије (4) HTML верзија

Направи обавештење

Цитирај

Напредна претрага

Сачувано у мојој библиотеци

Tokenlearner: What can 8 learned tokens do for images and videos?

Human action recognition from various data modalities: A review

Vision transformers for action recognition: A survey

Frozen clip models are efficient video learners

Masked feature prediction for self-supervised visual pre-training

Florence: A new foundation model for computer vision

Multiview transformers for video recognition

Open-world object manipulation using pre-trained vision-language models