Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

Maxim: Multi-axis mlp for image processing

Z Tu, H Talebi, H Zhang, F Yang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recent progress on Transformers and multi-layer perceptron (MLP) models provide new
network architectural designs for computer vision tasks. Although these models proved to be …

Multi-stage progressive image restoration

SW Zamir, A Arora, S Khan, M Hayat… - Proceedings of the …, 2021 - openaccess.thecvf.com
Image restoration tasks demand a complex balance between spatial details and high-level
contextualized information while recovering images. In this paper, we propose a novel …

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

F Sener, D Chatterjee, D Shelepov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …

Hoi4d: A 4d egocentric dataset for category-level human-object interaction

Y Liu, Y Liu, C Jiang, K Lyu, W Wan… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the
research of category-level human-object interaction. HOI4D consists of 2.4 M RGB-D …

Diffusion action segmentation

D Liu, Q Li, AD Dinh, T Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal action segmentation is crucial for understanding long-form videos. Previous works
on this task commonly adopt an iterative refinement paradigm by using multi-stage models …

Unified fully and timestamp supervised temporal action segmentation via sequence to sequence translation

N Behrmann, SA Golestaneh, Z Kolter, J Gall… - European conference on …, 2022 - Springer
This paper introduces a unified framework for video action segmentation via sequence to
sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to …

Bridge-prompt: Towards ordinal action understanding in instructional videos

M Li, L Chen, Y Duan, Z Hu, J Feng… - Proceedings of the …, 2022 - openaccess.thecvf.com
Action recognition models have shown a promising capability to classify human actions in
short video clips. In a real scenario, multiple correlated human actions commonly occur in …

Error detection in egocentric procedural task videos

SP Lee, Z Lu, Z Zhang, M Hoai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a new egocentric procedural error dataset containing videos with various types
of errors as well as normal videos and propose a new framework for procedural error …

How Much Temporal Long-Term Context is Needed for Action Segmentation?

E Bahrami, G Francesca, J Gall - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Modeling long-term context in videos is crucial for many fine-grained tasks including
temporal action segmentation. An interesting question that is still open is how much long …