Dual-view data hallucination with semantic relation guidance for few-shot image recognition

H Wu, G Ye, Z Zhou, L Tian, Q Wang, L Lin - arxiv preprint arxiv …, 2024 - arxiv.org
Learning to recognize novel concepts from just a few image samples is very challenging as
the learned model is easily overfitted on the few data and results in poor generalizability …

UMT-net: A uniform multi-task network with adaptive task weighting

S Chen, L Zheng, L Huang, J Bai… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This article introduces a versatile multi-task learning framework (UMT-Net) and an adaptive
task weighting (ATW) training method, specifically designed for resource-constrained …

A novel model for fall detection and action recognition combined lightweight 3D-CNN and convolutional LSTM networks

C Su, J Wei, D Lin, L Kong, YL Guan - Pattern Analysis and Applications, 2024 - Springer
Three-dimensional convolutional neural networks (3D-CNNs) and full connection long short-
term memory networks (FC-LSTMs) have been demonstrated as a kind of powerful non …

SMTDKD: A Semantic-Aware Multimodal Transformer Fusion Decoupled Knowledge Distillation Method for Action Recognition

Z Quan, Q Chen, W Wang, M Zhang, X Li… - IEEE Sensors …, 2023 - ieeexplore.ieee.org
Multimodal sensors, including vision sensors and wearable sensors, offer valuable
complementary information for accurate recognition tasks. Nonetheless, the heterogeneity …

Worker abnormal behavior recognition based on spatio-temporal graph convolution and attention model

Z Li, A Zhang, F Han, J Zhu, Y Wang - Electronics, 2023 - mdpi.com
In response to the problem where many existing research models only consider acquiring
the temporal information between sequences of continuous skeletons and in response to the …

Clustering-based multi-featured self-supervised learning for human activities and video retrieval

MH Javed, Z Yu, TM Rajeh, F Rafique, T Li - Applied Intelligence, 2024 - Springer
Human-centric content-based video retrieval has emerged as a prominent area of research
due to its diverse applications. However, this task presents several inherent challenges …

GBC: Guided Alignment and Adaptive Boosting CLIP Bridging Vision and Language for Robust Action Recognition

Z Yang, G An, Z Zheng, S Cao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The Contrastive Language-Image Pre-training (CLIP) model achieves strong generalization
by using a large number of text-image pairs for contrastive learning. However, when it is …

ER-C3D: Enhancing R-C3-D Network With Adaptive Shrinkage and Symmetrical Multiscale for Behavior Detection

Z Huang, M Tao, N An, M Hu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Behavior detection receives considerable attention in real-life human–computer interaction,
where the complexity of background information and the variable durations of movements …

Spatiotemporal feature enhancement network for action recognition

G Huang, X Wang, X Li, Y Wang - Multimedia Tools and Applications, 2024 - Springer
As a hot topic in the field of computer vision, video action recognition has great application
potential, such as intelligent monitoring, data recommendation and virtual reality. However …

Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

S Lee, S Woo, MA Nugroho, C Kim - arxiv preprint arxiv:2311.12344, 2023 - arxiv.org
Due to the distinctive characteristics of sensors, each modality exhibits unique physical
properties. For this reason, in the context of multi-modal action recognition, it is important to …