- Academic Search

X Shen, Y Wang, M Lin, Y Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The rapid advances in Vision Transformer (ViT) refresh the state-of-the-art performances in
various vision tasks, overshadowing the conventional CNN-based models. This ignites a few …

Uložit Citovat Počet citací tohoto článku: 36 Související články Všechny verze (počet: 8) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Diversifying spatial-temporal perception for video domain generalization

KY Lin, JR Du, Y Gao, J Zhou… - Advances in Neural …, 2024 - proceedings.neurips.cc

Video domain generalization aims to learn generalizable video classification models for
unseen target domains by training in a source domain. A critical challenge of video domain …

Uložit Citovat Počet citací tohoto článku: 15 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Temporally-adaptive models for efficient video understanding

Z Huang, S Zhang, L Pan, Z Qing, Y Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Spatial convolutions are extensively used in numerous deep video models. It fundamentally
assumes spatio-temporal invariance, ie, using shared weights for every location in different …

Uložit Citovat Počet citací tohoto článku: 10 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multimodal cross-domain few-shot learning for egocentric action recognition

M Hatano, R Hachiuma, R Fujii, H Saito - European Conference on …, 2024 - Springer

We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Privacy-safe Action Recognition via Cross-Modality Distillation

Y Kim, J Jung, H Noh, B Ahn, JH Kwon, DG Choi - IEEE Access, 2024 - ieeexplore.ieee.org

Human action recognition systems enhance public safety by detecting abnormal behavior
autonomously. RGB sensors commonly used in such systems capture personal information …

Uložit Citovat Počet citací tohoto článku: 1 Související články

[Free GPT-4]
[DeepSeek]

[PDF] researchsquare.com

Dynamical semantic enhancement network for continuous sign language recognition

S Wang, L Guo, W Xue - Multimedia Systems, 2024 - Springer

In the field of sign language recognition, effective interpretation of semantic information,
which is primarily conveyed through facial and hand gestures, poses significant challenges …

Uložit Citovat Počet citací tohoto článku: 1 Související články Všechny verze (počet: 4)

STAN: Spatio-Temporal Analysis Network for efficient video action recognition

S Chen, X Wang, Y Sun, K Yang - Expert Systems with Applications, 2025 - Elsevier

Action recognition, whose goal is identifying and extracting spatio-temporal features from
video content, is a foundation of work in video understanding. However, current methods for …

Uložit Citovat Související články

Privacy-enhanced zero-shot learning via data-free knowledge transfer

R Gao, F Wan, D Organisciak, J Pu… - … on Multimedia and …, 2023 - ieeexplore.ieee.org

Considering the increasing concerns about data copyright and sensitivity issues, we present
a novel Privacy-Enhanced Zero-Shot Learning (PE-ZSL) paradigm. The key innovation is to …

Uložit Citovat Počet citací tohoto článku: 3 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[HTML] mdpi.com

[HTML][HTML] Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search

TT Nguyen, JH Han - Electronics, 2024 - mdpi.com

Designing a high-performance neural network is a difficult task. Neural architecture search
(NAS) methods aim to solve this process. However, the construction of a high-quality …

Uložit Citovat Související články Všechny verze (počet: 3) Archiv

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding

Y Zhai, W Li, Y Tang, X Chen, Y Wang - arxiv preprint arxiv:2405.08344, 2024 - arxiv.org

Current architectures for video understanding mainly build upon 3D convolutional blocks or
2D convolutions with additional operations for temporal modeling. However, these methods …

Uložit Citovat Související články Všechny verze (počet: 2) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Maximizing spatio-temporal entropy of deep 3d cnns for efficient video recognition

Deepmad: Mathematical architecture design for deep convolutional neural network

Diversifying spatial-temporal perception for video domain generalization

Temporally-adaptive models for efficient video understanding

Multimodal cross-domain few-shot learning for egocentric action recognition

Privacy-safe Action Recognition via Cross-Modality Distillation

Dynamical semantic enhancement network for continuous sign language recognition

STAN: Spatio-Temporal Analysis Network for efficient video action recognition

Privacy-enhanced zero-shot learning via data-free knowledge transfer

[HTML][HTML] Zero-Shot Proxy with Incorporated-Score for Lightweight Deep Neural Architecture Search

No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding