Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Bmn: Boundary-matching network for temporal action proposal generation

T Lin, X Liu, X Li, E Ding, S Wen - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …

Bsn: Boundary sensitive network for temporal action proposal generation

T Lin, X Zhao, H Su, C Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Temporal action proposal generation is an important yet challenging problem, since
temporal proposals with rich action content are indispensable for analysing real-world …

End-to-end dense video captioning with masked transformer

L Zhou, Y Zhou, JJ Corso… - Proceedings of the …, 2018 - openaccess.thecvf.com
Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …

Untrimmednets for weakly supervised action recognition and detection

L Wang, Y **ong, D Lin… - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com
Current action recognition methods heavily rely on trimmed videos for model training.
However, it is expensive and time-consuming to acquire a large-scale trimmed video …

Videollm: Modeling video sequence with large language models

G Chen, YD Zheng, J Wang, J Xu, Y Huang… - arxiv preprint arxiv …, 2023 - arxiv.org
With the exponential growth of video data, there is an urgent need for automated technology
to analyze and comprehend video content. However, existing video understanding models …

Turn tap: Temporal unit regression network for temporal action proposals

J Gao, Z Yang, K Chen, C Sun… - Proceedings of the …, 2017 - openaccess.thecvf.com
We address the problem of Temporal Action Proposal (TAP) generation. This is an important
problem, as fast extraction of semantically important (eg human actions) segments from …

AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

Deep learning for video-text retrieval: a review

C Zhu, Q Jia, W Chen, Y Guo, Y Liu - International Journal of Multimedia …, 2023 - Springer
Abstract Video-Text Retrieval (VTR) aims to search for the most relevant video related to the
semantics in a given sentence, and vice versa. In general, this retrieval task is composed of …

TallFormer: Temporal Action Localization with a Long-Memory Transformer

F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …