Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Star-transformer: a spatio-temporal cross attention transformer for human action recognition

D Ahn, S Kim, H Hong, BC Ko - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In action recognition, although the combination of spatio-temporal videos and skeleton
features can improve the recognition performance, a separate model and balancing feature …

Memot: Multi-object tracking with memory

J Cai, M Xu, W Li, Y ** your eye on the ball: Trajectory attention in video transformers
M Patrick, D Campbell, Y Asano… - Advances in neural …, 2021 - proceedings.neurips.cc
In video transformers, the time dimension is often treated in the same way as the two spatial
dimensions. However, in a scene where objects or the camera may move, a physical point …

Online human motion analysis in industrial context: A review

T Benmessabih, R Slama, V Havard… - Engineering Applications of …, 2024 - Elsevier
Human motion analysis plays a crucial role in industry 4.0 and, more recently, in industry 5.0
where human-centered applications are becoming increasingly important, demonstrating its …

Physformer++: Facial video-based physiological measurement with slowfast temporal difference transformer

Z Yu, Y Shen, J Shi, H Zhao, Y Cui, J Zhang… - International Journal of …, 2023 - Springer
Remote photoplethysmography (rPPG), which aims at measuring heart activities and
physiological signals from facial video without any contact, has great potential in many …

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …

Videollm: Modeling video sequence with large language models

G Chen, YD Zheng, J Wang, J Xu, Y Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the exponential growth of video data, there is an urgent need for automated technology
to analyze and comprehend video content. However, existing video understanding models …