A comprehensive survey of neural architecture search: Challenges and solutions

P Ren, Y **ao, X Chang, PY Huang, Z Li… - ACM Computing …, 2021 - dl.acm.org
Deep learning has made substantial breakthroughs in many fields due to its powerful
automatic representation capabilities. It has been proven that neural architecture design is …

A review of convolutional-neural-network-based action recognition

G Yao, T Lei, J Zhong - Pattern Recognition Letters, 2019 - Elsevier
Video action recognition is widely applied in video indexing, intelligent surveillance,
multimedia understanding, and other fields. Recently, it was greatly improved by …

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Bmn: Boundary-matching network for temporal action proposal generation

T Lin, X Liu, X Li, E Ding, S Wen - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …

Expansion-squeeze-excitation fusion network for elderly activity recognition

X Shu, J Yang, R Yan, Y Song - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
This work focuses on the task of elderly activity recognition, which is a challenging task due
to the existence of individual actions and human-object interactions in elderly activities …

Video mamba suite: State space model as a versatile alternative for video understanding

G Chen, Y Huang, J Xu, B Pei, Z Chen, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding videos is one of the fundamental directions in computer vision research, with
extensive efforts dedicated to exploring various architectures such as RNN, 3D CNN, and …

Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?

K Hara, H Kataoka, Y Satoh - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
The purpose of this study is to determine whether current video datasets have sufficient data
for training very deep convolutional neural networks (CNNs) with spatio-temporal three …

A^ 2-nets: Double attention networks

Y Chen, Y Kalantidis, J Li, S Yan… - Advances in neural …, 2018 - proceedings.neurips.cc
Learning to capture long-range relations is fundamental to image/video recognition. Existing
CNN models generally rely on increasing depth to model such relations which is highly …

Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification

S **e, C Sun, J Huang, Z Tu… - Proceedings of the …, 2018 - openaccess.thecvf.com
Despite the steady progress in video analysis led by the adoption of convolutional neural
networks (CNNs), the relative improvement has been less drastic as that in 2D static image …

Bsn: Boundary sensitive network for temporal action proposal generation

T Lin, X Zhao, H Su, C Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Temporal action proposal generation is an important yet challenging problem, since
temporal proposals with rich action content are indispensable for analysing real-world …