A comprehensive survey of vision-based human action recognition methods
Although widely used in many applications, accurate and efficient human action recognition
remains a challenging area of research in the field of computer vision. Most recent surveys …
remains a challenging area of research in the field of computer vision. Most recent surveys …
A comprehensive survey of scene graphs: Generation and application
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …
attributes, and relationships between objects in the scene. As computer vision technology …
Clip-event: Connecting text and images with event structures
Abstract Vision-language (V+ L) pretraining models have achieved great success in
supporting multimedia applications by understanding the alignments between images and …
supporting multimedia applications by understanding the alignments between images and …
Reconstructing hands in 3d with transformers
We present an approach that can reconstruct hands in 3D from monocular input. Our
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …
approach for Hand Mesh Recovery HaMeR follows a fully transformer-based architecture …
Learning human-object interactions by graph parsing neural networks
This paper addresses the task of detecting and recognizing human-object interactions (HOI)
in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a …
in images and videos. We introduce the Graph Parsing Neural Network (GPNN), a …
Drg: Dual relation graph for human-object interaction detection
We tackle the challenging problem of human-object interaction (HOI) detection. Existing
methods either recognize the interaction of each human-object pair in isolation or perform …
methods either recognize the interaction of each human-object pair in isolation or perform …
Neural motifs: Scene graph parsing with global context
We investigate the problem of producing structured graph representations of visual scenes.
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
Our work analyzes the role of motifs: regularly appearing substructures in scene graphs. We …
Understanding human hands in contact at internet scale
Hands are the central means by which humans manipulate their world and being able to
reliably extract hand state information from Internet videos of humans engaged in their …
reliably extract hand state information from Internet videos of humans engaged in their …
Ava: A video dataset of spatio-temporally localized atomic visual actions
This paper introduces a video dataset of spatio-temporally localized Atomic Visual Actions
(AVA). The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video …
(AVA). The AVA dataset densely annotates 80 atomic visual actions in 437 15-minute video …
Scene graph generation by iterative message passing
Understanding a visual scene goes beyond recognizing individual objects in isolation.
Relationships between objects also constitute rich semantic information about the scene. In …
Relationships between objects also constitute rich semantic information about the scene. In …