Towards global video scene segmentation with context-aware transformer
Videos such as movies or TV episodes usually need to divide the long storyline into
cohesive units, ie, scenes, to facilitate the understanding of video semantics. The key …
cohesive units, ie, scenes, to facilitate the understanding of video semantics. The key …
Multimodal high-order relation transformer for scene boundary detection
Scene boundary detection breaks down long videos into meaningful story-telling units and
plays a crucial role in high-level video understanding. Despite significant advancements in …
plays a crucial role in high-level video understanding. Despite significant advancements in …
Newsnet: A novel dataset for hierarchical temporal segmentation
Temporal video segmentation is the get-to-go automatic video analysis, which decomposes
a long-form video into smaller components for the following-up understanding tasks. Recent …
a long-form video into smaller components for the following-up understanding tasks. Recent …
Collaborative noisy label cleaner: Learning scene-aware trailers for multi-modal highlight detection in movies
Movie highlights stand out of the screenplay for efficient browsing and play a crucial role on
social media platforms. Based on existing efforts, this work has two observations:(1) For …
social media platforms. Based on existing efforts, this work has two observations:(1) For …
Characters link shots: character attention network for movie scene segmentation
Movie scene segmentation aims to automatically segment a movie into multiple story units,
ie, scenes, each of which is a series of semantically coherent and time-continual shots …
ie, scenes, each of which is a series of semantically coherent and time-continual shots …
Video Scene Detection Using Transformer Encoding Linker Network (TELNet)
This paper introduces a transformer encoding linker network (TELNet) for automatically
identifying scene boundaries in videos without prior knowledge of their structure. Videos …
identifying scene boundaries in videos without prior knowledge of their structure. Videos …
Movies2Scenes: Using movie metadata to learn scene representation
Understanding scenes in movies is crucial for a variety of applications such as video
moderation, search, and recommendation. However, labeling individual scenes is a time …
moderation, search, and recommendation. However, labeling individual scenes is a time …
Neighbor Relations Matter in Video Scene Detection
Video scene detection aims to temporally link shots for obtaining semantically compact
scenes. It is essential for this task to capture scene-distinguishable affinity among shots by …
scenes. It is essential for this task to capture scene-distinguishable affinity among shots by …
SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers
Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video
content. However, conventional static ADs can leave out detailed information in videos …
content. However, conventional static ADs can leave out detailed information in videos …
Mega: Multimodal alignment aggregation and distillation for cinematic video segmentation
Previous research has studied the task of segmenting cinematic videos into scenes and into
narrative acts. However, these studies have overlooked the essential task of multimodal …
narrative acts. However, these studies have overlooked the essential task of multimodal …