Transformer for skeleton-based action recognition: A review of recent advances

W **n, R Liu, Y Liu, Y Chen, W Yu, Q Miao - Neurocomputing, 2023 - Elsevier
Skeleton-based action recognition has rapidly become one of the most popular and
essential research topics in computer vision. The task is to analyze the characteristics of …

Action recognition based on RGB and skeleton data sets: A survey

R Yue, Z Tian, S Du - Neurocomputing, 2022 - Elsevier
Action recognition is a major branch of computer vision research. As a widely used
technology, action recognition has been applied to human–computer interaction, intelligent …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

F Sener, D Chatterjee, D Shelepov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …

Revisiting the" video" in video-language understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Video transformer network

D Neimark, O Bar, M Zohar… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents VTN, a transformer-based framework for video recognition. Inspired by
recent developments in vision transformers, we ditch the standard approach in video action …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

Molo: Motion-augmented long-short contrastive learning for few-shot action recognition

X Wang, S Zhang, Z Qing, C Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Current state-of-the-art approaches for few-shot action recognition achieve promising
performance by conducting frame-level matching on learned visual features. However, they …

PIT: Progressive interaction transformer for pedestrian crossing intention prediction

Y Zhou, G Tan, R Zhong, Y Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
For autonomous driving, one of the major challenges is to predict pedestrian crossing
intention in ego-view. Pedestrian intention depends not only on their intrinsic goals but also …