Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

After-unet: Axial fusion transformer unet for medical image segmentation

X Yan, H Tang, S Sun, H Ma… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recent advances in transformer-based models have drawn attention to exploring these
techniques in medical image segmentation, especially in conjunction with the U-Net model …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

Transfusion: Cross-view fusion with transformer for 3d human pose estimation

H Ma, L Chen, D Kong, Z Wang, X Liu, H Tang… - arxiv preprint arxiv …, 2021 - arxiv.org
Estimating the 2D human poses in each view is typically the first step in calibrated multi-view
3D pose estimation. But the performance of 2D pose detectors suffers from challenging …

Multi-task learning of object states and state-modifying actions from web videos

T Soucek, JB Alayrac, A Miech, I Laptev… - IEEE Transactions on …, 2024 - computer.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Complementary parts contrastive learning for fine-grained weakly supervised object co-localization

L Ma, F Zhao, H Hong, L Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The aim of weakly supervised object co-localization is to locate different objects of the same
superclass in a dataset. Recent methods achieve impressive co-localization performance by …

Multi-task learning of object state changes from uncurated videos

T Souček, JB Alayrac, A Miech, I Laptev… - arxiv preprint arxiv …, 2022 - arxiv.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Multi-task learning of object states and state-modifying actions from web videos

T Souček, JB Alayrac, A Miech… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Permutation-aware activity segmentation via unsupervised frame-to-segment alignment

QH Tran, A Mehmood, M Ahmed… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper presents an unsupervised transformer-based framework for temporal activity
segmentation which leverages not only frame-level cues but also segment-level cues. This …

Leveraging triplet loss for unsupervised action segmentation

E Bueno-Benito, BT Vecino… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we propose a novel fully unsupervised framework that learns action
representations suitable for the action segmentation task from the single input video itself …