Академия Google

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier

Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

Сохранить Цитировать Цитируется: 270 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

Convolutional neural networks or vision transformers: Who will win the race for action recognitions in visual data?

O Moutik, H Sekkat, S Tigani, A Chehri, R Saadane… - Sensors, 2023 - mdpi.com

Understanding actions in videos remains a significant challenge in computer vision, which
has been the subject of several pieces of research in the last decades. Convolutional neural …

Сохранить Цитировать Цитируется: 72 Похожие статьи Все версии статьи (12) Сохраненная копия

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Сохранить Цитировать Цитируется: 387 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Сохранить Цитировать Цитируется: 1167 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

Сохранить Цитировать Цитируется: 421 Похожие статьи Все версии статьи (6)

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Actionclip: A new paradigm for video action recognition

M Wang, J **ng, Y Liu - arxiv preprint arxiv:2109.08472, 2021 - arxiv.org

The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

Сохранить Цитировать Цитируется: 453 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniformer: Unified transformer for efficient spatiotemporal representation learning

K Li, Y Wang, P Gao, G Song, Y Liu, H Li… - arxiv preprint arxiv …, 2022 - arxiv.org

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-
dimensional videos, due to large local redundancy and complex global dependency …

Сохранить Цитировать Цитируется: 333 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Continuous sign language recognition with correlation network

L Hu, L Gao, Z Liu, W Feng - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Human body trajectories are a salient cue to identify actions in video. Such body trajectories
are mainly conveyed by hands and face across consecutive frames in sign language …

Сохранить Цитировать Цитируется: 86 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Сохранить Цитировать Цитируется: 92 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Tdn: Temporal difference networks for efficient action recognition

L Wang, Z Tong, B Ji, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …

Сохранить Цитировать Цитируется: 515 Похожие статьи Все версии статьи (8) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Teinet: Towards an efficient architecture for video recognition

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

Convolutional neural networks or vision transformers: Who will win the race for action recognitions in visual data?

Videomae v2: Scaling video masked autoencoders with dual masking

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Uniformer: Unifying convolution and self-attention for visual recognition

Actionclip: A new paradigm for video action recognition

Uniformer: Unified transformer for efficient spatiotemporal representation learning

Continuous sign language recognition with correlation network

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

Tdn: Temporal difference networks for efficient action recognition