[HTML][HTML] Aerialformer: Multi-resolution transformer for aerial image segmentation

T Hanyu, K Yamazaki, M Tran, RA McCann, H Liao… - Remote Sensing, 2024 - mdpi.com
When performing remote sensing image segmentation, practitioners often encounter various
challenges, such as a strong imbalance in the foreground–background, the presence of tiny …

Vltint: Visual-linguistic transformer-in-transformer for coherent video paragraph captioning

K Yamazaki, K Vo, QS Truong, B Raj… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Abstract Video Paragraph Captioning aims to generate a multi-sentence description of an
untrimmed video with multiple temporal event locations in a coherent storytelling. Following …

Clip-tsa: Clip-assisted temporal self-attention for weakly-supervised video anomaly detection

HK Joo, K Vo, K Yamazaki, N Le - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Video anomaly detection (VAD)–commonly formulated as a multiple-instance learning
problem in a weakly-supervised manner due to its labor-intensive nature–is a challenging …

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

T Phan, K Vo, D Le, G Doretto… - Proceedings of the …, 2024 - openaccess.thecvf.com
Temporal action detection (TAD) involves the localization and classification of action
instances within untrimmed videos. While standard TAD follows fully supervised learning …

Multi-modal prompting for low-shot temporal action localization

C Ju, Z Li, P Zhao, Y Zhang, X Zhang, Q Tian… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we consider the problem of temporal action localization under low-shot (zero-
shot & few-shot) scenario, with the goal of detecting and classifying the action instances from …

CNN-ViT supported weakly-supervised video segment level anomaly detection

MH Sharif, L Jiao, CW Omlin - Sensors, 2023 - mdpi.com
Video anomaly event detection (VAED) is one of the key technologies in computer vision for
smart surveillance systems. With the advent of deep learning, contemporary advances in …

Truncated attention-aware proposal networks with multi-scale dilation for temporal action detection

P Li, J Cao, L Yuan, Q Ye, X Xu - Pattern Recognition, 2023 - Elsevier
Detecting actions temporally in untrimmed videos is very challenging, and it accomplishes
action classification and localization simultaneously. Capturing the relations among action …

Contextual explainable video representation: Human perception-based understanding

K Vo, K Yamazaki, PX Nguyen… - 2022 56th Asilomar …, 2022 - ieeexplore.ieee.org
Video understanding is a growing field and a subject of intense research, which includes
many interesting tasks to understanding both spatial and temporal information, eg, action …

Anomaly Detection in Weakly Supervised Videos Using Multistage Graphs and General Deep Learning Based Spatial-Temporal Feature Enhancement

J Shin, Y Kaneko, ASM Miah, N Hassan… - IEEE …, 2024 - ieeexplore.ieee.org
Weakly supervised video anomaly detection (WS-VAD) is a crucial research domain in
computer vision for the implementation of intelligent surveillance systems. Many researchers …

SolarFormer: Multi-scale transformer for solar PV profiling

A De Luis, M Tran, T Hanyu, A Tran… - … on Smart Grid …, 2024 - ieeexplore.ieee.org
As climate change intensifies, the global imperative to shift towards sustainable energy
sources becomes more pronounced. Photovoltaic (PV) energy is a favored choice due to its …