Human action recognition from various data modalities: A review
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …
each action. It has a wide range of applications, and therefore has been attracting increasing …
Verbs in action: Improving verb understanding in video-language models
Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …
and the environment through space and time. Recently, state-of-the-art video-language …
Vectorized evidential learning for weakly-supervised temporal action localization
With the explosive growth of videos, weakly-supervised temporal action localization (WS-
TAL) task has become a promising research direction in pattern analysis and machine …
TAL) task has become a promising research direction in pattern analysis and machine …
Learning state-aware visual representations from audible interactions
We propose a self-supervised algorithm to learn representations from egocentric video data.
Recently, significant efforts have been made to capture humans interacting with their own …
Recently, significant efforts have been made to capture humans interacting with their own …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Learning action changes by measuring verb-adverb textual relationships
The goal of this work is to understand the way actions are performed in videos. That is, given
a video, we aim to predict an adverb indicating a modification applied to the action (eg cut" …
a video, we aim to predict an adverb indicating a modification applied to the action (eg cut" …
Multi-task learning of object state changes from uncurated videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Multi-task learning of object states and state-modifying actions from web videos
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …
modifying actions by observing people interacting with objects in long uncurated web …
Coarse or Fine? Recognising Action End States without Labels
We focus on the problem of recognising the end state of an action in an image which is
critical for understanding what action is performed and in which manner. We study this …
critical for understanding what action is performed and in which manner. We study this …
Video-adverb retrieval with compositional adverb-action embeddings
Retrieving adverbs that describe an action in a video poses a crucial step towards fine-
grained video understanding. We propose a framework for video-to-adverb retrieval (and …
grained video understanding. We propose a framework for video-to-adverb retrieval (and …