Temporal action segmentation: An analysis of modern techniques
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …
minutes-long videos with multiple action classes. As a long-range video understanding task …
Maxim: Multi-axis mlp for image processing
Recent progress on Transformers and multi-layer perceptron (MLP) models provide new
network architectural designs for computer vision tasks. Although these models proved to be …
network architectural designs for computer vision tasks. Although these models proved to be …
Multi-stage progressive image restoration
Image restoration tasks demand a complex balance between spatial details and high-level
contextualized information while recovering images. In this paper, we propose a novel …
contextualized information while recovering images. In this paper, we propose a novel …
Assembly101: A large-scale multi-view video dataset for understanding procedural activities
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …
Hoi4d: A 4d egocentric dataset for category-level human-object interaction
We present HOI4D, a large-scale 4D egocentric dataset with rich annotations, to catalyze the
research of category-level human-object interaction. HOI4D consists of 2.4 M RGB-D …
research of category-level human-object interaction. HOI4D consists of 2.4 M RGB-D …
Diffusion action segmentation
Temporal action segmentation is crucial for understanding long-form videos. Previous works
on this task commonly adopt an iterative refinement paradigm by using multi-stage models …
on this task commonly adopt an iterative refinement paradigm by using multi-stage models …
Unified fully and timestamp supervised temporal action segmentation via sequence to sequence translation
This paper introduces a unified framework for video action segmentation via sequence to
sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to …
sequence (seq2seq) translation in a fully and timestamp supervised setup. In contrast to …
Bridge-prompt: Towards ordinal action understanding in instructional videos
Action recognition models have shown a promising capability to classify human actions in
short video clips. In a real scenario, multiple correlated human actions commonly occur in …
short video clips. In a real scenario, multiple correlated human actions commonly occur in …
Error detection in egocentric procedural task videos
We present a new egocentric procedural error dataset containing videos with various types
of errors as well as normal videos and propose a new framework for procedural error …
of errors as well as normal videos and propose a new framework for procedural error …
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Modeling long-term context in videos is crucial for many fine-grained tasks including
temporal action segmentation. An interesting question that is still open is how much long …
temporal action segmentation. An interesting question that is still open is how much long …