A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021‏ - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023‏ - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

Moviechat: From dense token to sparse memory for long video understanding

E Song, W Chai, G Wang, Y Zhang… - Proceedings of the …, 2024‏ - openaccess.thecvf.com
Recently integrating video foundation models and large language models to build a video
understanding system can overcome the limitations of specific pre-defined vision tasks. Yet …

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

F Sener, D Chatterjee, D Shelepov… - Proceedings of the …, 2022‏ - openaccess.thecvf.com
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …

Next-qa: Next phase of question-answering to explaining temporal actions

J **ao, X Shang, A Yao… - Proceedings of the IEEE …, 2021‏ - openaccess.thecvf.com
We introduce NExT-QA, a rigorously designed video question answering (VideoQA)
benchmark to advance video understanding from describing to explaining the temporal …

Anticipative video transformer

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021‏ - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

Diffusion action segmentation

D Liu, Q Li, AD Dinh, T Jiang… - Proceedings of the …, 2023‏ - openaccess.thecvf.com
Temporal action segmentation is crucial for understanding long-form videos. Previous works
on this task commonly adopt an iterative refinement paradigm by using multi-stage models …

Videollm: Modeling video sequence with large language models

G Chen, YD Zheng, J Wang, J Xu, Y Huang… - arxiv preprint arxiv …, 2023‏ - arxiv.org
With the exponential growth of video data, there is an urgent need for automated technology
to analyze and comprehend video content. However, existing video understanding models …

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y **ong, C Wu… - arxiv preprint arxiv …, 2020‏ - arxiv.org
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

Longvlm: Efficient long video understanding via large language models

Y Weng, M Han, H He, X Chang, B Zhuang - European Conference on …, 2024‏ - Springer
Abstract Empowered by Large Language Models (LLMs), recent advancements in Video-
based LLMs (VideoLLMs) have driven progress in various video understanding tasks. These …