[HTML][HTML] Aerialformer: Multi-resolution transformer for aerial image segmentation
When performing remote sensing image segmentation, practitioners often encounter various
challenges, such as a strong imbalance in the foreground–background, the presence of tiny …
challenges, such as a strong imbalance in the foreground–background, the presence of tiny …
Vltint: Visual-linguistic transformer-in-transformer for coherent video paragraph captioning
Abstract Video Paragraph Captioning aims to generate a multi-sentence description of an
untrimmed video with multiple temporal event locations in a coherent storytelling. Following …
untrimmed video with multiple temporal event locations in a coherent storytelling. Following …
Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection
Video anomaly detection (VAD)–commonly formulated as a multiple-instance learning
problem in a weakly-supervised manner due to its labor-intensive nature–is a challenging …
problem in a weakly-supervised manner due to its labor-intensive nature–is a challenging …
ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Temporal action detection (TAD) involves the localization and classification of action
instances within untrimmed videos. While standard TAD follows fully supervised learning …
instances within untrimmed videos. While standard TAD follows fully supervised learning …
Multi-modal prompting for low-shot temporal action localization
In this paper, we consider the problem of temporal action localization under low-shot (zero-
shot & few-shot) scenario, with the goal of detecting and classifying the action instances from …
shot & few-shot) scenario, with the goal of detecting and classifying the action instances from …
CNN-ViT supported weakly-supervised video segment level anomaly detection
Video anomaly event detection (VAED) is one of the key technologies in computer vision for
smart surveillance systems. With the advent of deep learning, contemporary advances in …
smart surveillance systems. With the advent of deep learning, contemporary advances in …
Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection
P Li, J Cao, L Yuan, Q Ye, X Xu - Pattern Recognition, 2023 - Elsevier
Detecting actions temporally in untrimmed videos is very challenging, and it accomplishes
action classification and localization simultaneously. Capturing the relations among action …
action classification and localization simultaneously. Capturing the relations among action …
Contextual explainable video representation: Human perception-based understanding
Video understanding is a growing field and a subject of intense research, which includes
many interesting tasks to understanding both spatial and temporal information, eg, action …
many interesting tasks to understanding both spatial and temporal information, eg, action …
Anomaly Detection in Weakly Supervised Videos Using Multistage Graphs and General Deep Learning Based Spatial-Temporal Feature Enhancement
Weakly supervised video anomaly detection (WS-VAD) is a crucial research domain in
computer vision for the implementation of intelligent surveillance systems. Many researchers …
computer vision for the implementation of intelligent surveillance systems. Many researchers …
SolarFormer: Multi-scale transformer for solar PV profiling
As climate change intensifies, the global imperative to shift towards sustainable energy
sources becomes more pronounced. Photovoltaic (PV) energy is a favored choice due to its …
sources becomes more pronounced. Photovoltaic (PV) energy is a favored choice due to its …