Temporal action segmentation: An analysis of modern techniques
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …
minutes-long videos with multiple action classes. As a long-range video understanding task …
After-unet: Axial fusion transformer unet for medical image segmentation
Recent advances in transformer-based models have drawn attention to exploring these
techniques in medical image segmentation, especially in conjunction with the U-Net model …
techniques in medical image segmentation, especially in conjunction with the U-Net model …
Progress-aware online action segmentation for egocentric procedural task videos
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …
videos. While previous studies have mostly focused on offline action segmentation where …
Transfusion: Cross-view fusion with transformer for 3d human pose estimation
Estimating the 2D human poses in each view is typically the first step in calibrated multi-view
3D pose estimation. But the performance of 2D pose detectors suffers from challenging …
3D pose estimation. But the performance of 2D pose detectors suffers from challenging …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Complementary parts contrastive learning for fine-grained weakly supervised object co-localization
L Ma, F Zhao, H Hong, L Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The aim of weakly supervised object co-localization is to locate different objects of the same
superclass in a dataset. Recent methods achieve impressive co-localization performance by …
superclass in a dataset. Recent methods achieve impressive co-localization performance by …
Multi-task learning of object state changes from uncurated videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Permutation-aware activity segmentation via unsupervised frame-to-segment alignment
This paper presents an unsupervised transformer-based framework for temporal activity
segmentation which leverages not only frame-level cues but also segment-level cues. This …
segmentation which leverages not only frame-level cues but also segment-level cues. This …
Leveraging triplet loss for unsupervised action segmentation
In this paper, we propose a novel fully unsupervised framework that learns action
representations suitable for the action segmentation task from the single input video itself …
representations suitable for the action segmentation task from the single input video itself …