Transformer for skeleton-based action recognition: A review of recent advances

W **n, R Liu, Y Liu, Y Chen, W Yu, Q Miao - Neurocomputing, 2023 - Elsevier
Skeleton-based action recognition has rapidly become one of the most popular and
essential research topics in computer vision. The task is to analyze the characteristics of …

[HTML][HTML] RGB-D data-based action recognition: a review

MB Shaikh, D Chai - Sensors, 2021 - mdpi.com
Classification of human actions is an ongoing research problem in computer vision. This
review is aimed to scope current literature on data fusion and action recognition techniques …

Clip2video: Mastering video-text retrieval via image clip

H Fang, P **ong, L Xu, Y Chen - arxiv preprint arxiv:2106.11097, 2021 - arxiv.org
We present CLIP2Video network to transfer the image-language pre-training model to video-
text retrieval in an end-to-end manner. Leading approaches in the domain of video-and …

3mformer: Multi-order multi-mode transformer for skeletal action recognition

L Wang, P Koniusz - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Many skeletal action recognition models use GCNs to represent the human body by 3D
body joints connected body parts. GCNs aggregate one-or few-hop graph neighbourhoods …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y **ong, C Wu… - arxiv preprint arxiv …, 2020 - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Late temporal modeling in 3d cnn architectures with bert for action recognition

ME Kalfaoglu, S Kalkan, AA Alatan - … : Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
In this work, we combine 3D convolution with late temporal modeling for action recognition.
For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at …

Two-stream consensus network for weakly-supervised temporal action localization

Y Zhai, L Wang, W Tang, Q Zhang, J Yuan… - Computer Vision–ECCV …, 2020 - Springer
Abstract Weakly-supervised Temporal Action Localization (W-TAL) aims to classify and
localize all action instances in an untrimmed video under only video-level supervision …

Transferring cross-domain knowledge for video sign language recognition

D Li, X Yu, C Xu, L Petersson… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Word-level sign language recognition (WSLR) is a fundamental task in sign language
interpretation. It requires models to recognize isolated sign words from videos. However …

Domain knowledge powered deep learning for breast cancer diagnosis based on contrast-enhanced ultrasound videos

C Chen, Y Wang, J Niu, X Liu, Q Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In recent years, deep learning has been widely used in breast cancer diagnosis, and many
high-performance models have emerged. However, most of the existing deep learning …

Fusing higher-order features in graph neural networks for skeleton-based action recognition

Z Qin, Y Liu, P Ji, D Kim, L Wang… - … on Neural Networks …, 2022 - ieeexplore.ieee.org
Skeleton sequences are lightweight and compact and thus are ideal candidates for action
recognition on edge devices. Recent skeleton-based action recognition methods extract …