Rewrite the stars

X Ma, X Dai, Y Bai, Y Wang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Recent studies have drawn attention to the untapped potential of the" star
operation"(element-wise multiplication) in network design. While intuitive explanations …

Dynamic and static mutual fitting for action recognition

W Liu, X Jia, X Zhong, K Jiang, X Yu, M Ye - Pattern Recognition, 2025 - Elsevier
Action recognition is intended to classify a video into a certain category by aggregating and
summarizing its temporal and spatial information. Existing methods have achieved …

Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition

SN Gowda, A Arnab, J Huang - European Conference on Computer Vision, 2024 - Springer
In this paper, we address the challenges posed by the substantial training time and memory
consumption associated with video transformers, focusing on the ViViT (Video Vision …

Gbc: Guided alignment and adaptive boosting clip bridging vision and language for robust action recognition

Z Yang, G An, Z Zheng, S Cao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The Contrastive Language-Image Pre-training (CLIP) model achieves strong generalization
by using a large number of text-image pairs for contrastive learning. However, when it is …

SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

W Huang, J Zhang, X Qian, Z Wu, M Wang… - Proceedings of the 32nd …, 2024 - dl.acm.org
High frame-rate~(HFR) videos of action recognition improve fine-grained expression while
reducing the spatio-temporal relation and motion information density. Thus, large amounts of …

Distillation-free Scaling of Large SSMs for Images and Videos

H Suleman, ST Wasim, M Naseer, J Gall - arxiv preprint arxiv:2409.11867, 2024 - arxiv.org
State-space models (SSMs), exemplified by S4, have introduced a novel context modeling
method by integrating state-space techniques into deep learning. However, they struggle …

RaSTFormer: region-aware spatiotemporal transformer for visual homogenization recognition in short videos

S Zhang, J Zhang, H Zhang, L Zhuo - Neural Computing and Applications, 2024 - Springer
With the surge in network traffic, the homogenization of short video content is becoming
increasingly prominent, resulting in low-quality entertainment due to proliferation and …

Focal modulation networks for interpretable sound classification

L Della Libera, C Subakan… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
The increasing success of deep neural networks has raised concerns about their inherent
black-box nature, posing challenges related to interpretability and trust. While there has …

VT-Grapher: Video Tube Graph Network with Self-Distillation for Human Action Recognition

X Liu, J Liu, X Cheng, J Li, W Wan… - IEEE Sensors Journal, 2024 - ieeexplore.ieee.org
The proliferation of videos captured by sensor-based cameras has driven the application of
human action recognition (HAR) task. As the fundamental video application in human …

Focal-TSMP: deep learning for vegetation health prediction and agricultural drought assessment from a regional climate simulation

MH Shams Eddin, J Gall - Geoscientific Model Development, 2024 - gmd.copernicus.org
Satellite-derived agricultural drought indices can provide a complementary perspective of
terrestrial vegetation trends. In addition, their integration for drought assessments under …