3mformer: Multi-order multi-mode transformer for skeletal action recognition

L Wang, P Koniusz - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Many skeletal action recognition models use GCNs to represent the human body by 3D
body joints connected body parts. GCNs aggregate one-or few-hop graph neighbourhoods …

[HTML][HTML] k-NN attention-based video vision transformer for action recognition

W Sun, Y Ma, R Wang - Neurocomputing, 2024 - Elsevier
Action Recognition aims to understand human behavior and predict a label for each action.
Recently, Vision Transformer (ViT) has achieved remarkable performance on action …

DeMAAE: deep multiplicative attention-based autoencoder for identification of peculiarities in video sequences

N Aslam, MH Kolekar - The Visual Computer, 2024 - Springer
In videos, anomaly detection is challenging due to its diverse nature in different application
domains. Reconstruction and prediction-based methods have been widely employed to …

A novel spatiotemporal urban land change simulation model: Coupling transformer encoder, convolutional neural network, and cellular automata

H Li, Z Liu, X Lin, M Qin, S Ye, P Gao - Journal of Geographical Sciences, 2024 - Springer
Land use and land cover change (LUCC) process exhibits spatial correlation and temporal
dependency. Accurate extraction of spatiotemporal features is important in enhancing the …

STRFormer: Spatial–Temporal–ReTemporal Transformer for 3D human pose estimation

X Liu, H Tang - Image and Vision Computing, 2023 - Elsevier
Transformer-based methods have emerged as the golden standard in 2D-3D human pose
estimation from video sequences, largely thanks to their powerful spatial–temporal feature …

Pose-guided robust action recognition for outdoor internet of things

J Yu, X Cheng, H Chen, Y Xu - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Skeleton-based human recognition is a key technology for visual feedback, which can help
the Internet of Things (IoT) interact with humans in a non-contact manner outdoors. Graph …

Spatio-temporal self-supervision enhanced transformer networks for action recognition

Y Zhang, H Zhang, G Wu, J Li - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
With the development of deep neural networks, video action recognition has gradually
become a research hotspot in recent years. However, the additional temporal dimension in …

基于增**负例多粒度区分模型的视频动作识别研究

刘良振, 杨阳, 夏莹杰, 邝砾 - 通信学报, 2024 - infocomm-journal.com
为提升模型对视频动作的细粒度区分能力, 提出一种基于对比学**的增**负例区分范式.
通过为每个视频生成增**负例集合, 以补充最难区分的视频-文本负例对. 为了进一步区分**负例 …

Context-aware augmentation for contrastive self-supervised representation learning

H Sepanj, P Fieguth - Journal of Computational Vision …, 2023 - openjournals.uwaterloo.ca
Self-supervised representation learning is fundamental in modern machine learning,
however, existing approaches often rely on conventional random image augmentations …