Deepmad: Mathematical architecture design for deep convolutional neural network

X Shen, Y Wang, M Lin, Y Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in
various vision tasks, overshadowing the conventional CNN-based models. This ignites a few …

Diversifying spatial-temporal perception for video domain generalization

KY Lin, JR Du, Y Gao, J Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc
Video domain generalization aims to learn generalizable video classification models for
unseen target domains by training in a source domain. A critical challenge of video domain …

Temporally-adaptive models for efficient video understanding

Z Huang, S Zhang, L Pan, Z Qing, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Spatial convolutions are extensively used in numerous deep video models. It fundamentally
assumes spatio-temporal invariance, ie, using shared weights for every location in different …

Multimodal cross-domain few-shot learning for egocentric action recognition

M Hatano, R Hachiuma, R Fujii, H Saito - European Conference on …, 2024 - Springer
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …

Privacy-safe Action Recognition via Cross-Modality Distillation

Y Kim, J Jung, H Noh, B Ahn, JH Kwon, DG Choi - IEEE Access, 2024 - ieeexplore.ieee.org
Human action recognition systems enhance public safety by detecting abnormal behavior
autonomously. RGB sensors commonly used in such systems capture personal information …

Dynamical semantic enhancement network for continuous sign language recognition

S Wang, L Guo, W Xue - Multimedia Systems, 2024 - Springer
In the field of sign language recognition, effective interpretation of semantic information,
which is primarily conveyed through facial and hand gestures, poses significant challenges …

STAN: Spatio-Temporal Analysis Network for efficient video action recognition

S Chen, X Wang, Y Sun, K Yang - Expert Systems with Applications, 2025 - Elsevier
Action recognition, whose goal is identifying and extracting spatio-temporal features from
video content, is a foundation of work in video understanding. However, current methods for …

Privacy-enhanced zero-shot learning via data-free knowledge transfer

R Gao, F Wan, D Organisciak, J Pu… - … on Multimedia and …, 2023 - ieeexplore.ieee.org
Considering the increasing concerns about data copyright and sensitivity issues, we present
a novel Privacy-Enhanced Zero-Shot Learning (PE-ZSL) paradigm. The key innovation is to …

[HTML][HTML] Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search

TT Nguyen, JH Han - Electronics, 2024 - mdpi.com
Designing a high-performance neural network is a difficult task. Neural architecture search
(NAS) methods aim to solve this process. However, the construction of a high-quality …

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Y Zhai, W Li, Y Tang, X Chen, Y Wang - arxiv preprint arxiv:2405.08344, 2024 - arxiv.org
Current architectures for video understanding mainly build upon 3D convolutional blocks or
2D convolutions with additional operations for temporal modeling. However, these methods …