Human action recognition from various data modalities: A review

Z Sun, Q Ke, H Rahmani, M Bennamoun… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Human Action Recognition (HAR) aims to understand human behavior and assign a label to
each action. It has a wide range of applications, and therefore has been attracting increasing …

Verbs in action: Improving verb understanding in video-language models

L Momeni, M Caron, A Nagrani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …

Vectorized evidential learning for weakly-supervised temporal action localization

J Gao, M Chen, C Xu - IEEE transactions on pattern analysis …, 2023 - ieeexplore.ieee.org
With the explosive growth of videos, weakly-supervised temporal action localization (WS-
TAL) task has become a promising research direction in pattern analysis and machine …

Learning state-aware visual representations from audible interactions

H Mittal, P Morgado, U Jain… - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose a self-supervised algorithm to learn representations from egocentric video data.
Recently, significant efforts have been made to capture humans interacting with their own …

Multi-task learning of object states and state-modifying actions from web videos

T Soucek, JB Alayrac, A Miech, I Laptev… - IEEE Transactions on …, 2024 - computer.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Learning action changes by measuring verb-adverb textual relationships

D Moltisanti, F Keller, H Bilen… - Proceedings of the …, 2023 - openaccess.thecvf.com
The goal of this work is to understand the way actions are performed in videos. That is, given
a video, we aim to predict an adverb indicating a modification applied to the action (eg cut" …

Multi-task learning of object state changes from uncurated videos

T Souček, JB Alayrac, A Miech, I Laptev… - arxiv preprint arxiv …, 2022 - arxiv.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Multi-task learning of object states and state-modifying actions from web videos

T Souček, JB Alayrac, A Miech… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We aim to learn to temporally localize object state changes and the corresponding state-
modifying actions by observing people interacting with objects in long uncurated web …

Coarse or Fine? Recognising Action End States without Labels

D Moltisanti, H Bilen, L Sevilla-Lara… - Proceedings of the …, 2024 - openaccess.thecvf.com
We focus on the problem of recognising the end state of an action in an image which is
critical for understanding what action is performed and in which manner. We study this …

Video-adverb retrieval with compositional adverb-action embeddings

T Hummel, OB Mercea, A Koepke, Z Akata - arxiv preprint arxiv …, 2023 - arxiv.org
Retrieving adverbs that describe an action in a video poses a crucial step towards fine-
grained video understanding. We propose a framework for video-to-adverb retrieval (and …