Deep learning-based action detection in untrimmed videos: A survey
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …
applications, and is critical for video analysis. Despite the progress of action recognition …
Bmn: Boundary-matching network for temporal action proposal generation
Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …
locate temporal regions in real-world videos where action or event may occur. Current …
Bsn: Boundary sensitive network for temporal action proposal generation
Temporal action proposal generation is an important yet challenging problem, since
temporal proposals with rich action content are indispensable for analysing real-world …
temporal proposals with rich action content are indispensable for analysing real-world …
End-to-end dense video captioning with masked transformer
Dense video captioning aims to generate text descriptions for all events in an untrimmed
video. This involves both detecting and describing events. Therefore, all previous methods …
video. This involves both detecting and describing events. Therefore, all previous methods …
Untrimmednets for weakly supervised action recognition and detection
Current action recognition methods heavily rely on trimmed videos for model training.
However, it is expensive and time-consuming to acquire a large-scale trimmed video …
However, it is expensive and time-consuming to acquire a large-scale trimmed video …
Videollm: Modeling video sequence with large language models
With the exponential growth of video data, there is an urgent need for automated technology
to analyze and comprehend video content. However, existing video understanding models …
to analyze and comprehend video content. However, existing video understanding models …
Turn tap: Temporal unit regression network for temporal action proposals
We address the problem of Temporal Action Proposal (TAP) generation. This is an important
problem, as fast extraction of semantically important (eg human actions) segments from …
problem, as fast extraction of semantically important (eg human actions) segments from …
AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description
J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …
in computer vision and natural language understanding due to the intricate nature of video …
Deep learning for video-text retrieval: a review
Abstract Video-Text Retrieval (VTR) aims to search for the most relevant video related to the
semantics in a given sentence, and vice versa. In general, this retrieval task is composed of …
semantics in a given sentence, and vice versa. In general, this retrieval task is composed of …
TallFormer: Temporal Action Localization with a Long-Memory Transformer
F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i)
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …
short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …