TN-ZSTAD: Transferable network for zero-shot temporal activity detection
An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …
means to simultaneously recognize and localize activities in long untrimmed videos …
In the eye of beholder: Joint learning of gaze and actions in first person video
We address the task of jointly determining what a person is doing and where they are
looking based on the analysis of video captured by a headworn camera. We propose a …
looking based on the analysis of video captured by a headworn camera. We propose a …
Rt-gene: Real-time eye gaze estimation in natural environments
In this work, we consider the problem of robust gaze estimation in natural environments.
Large camera-to-subject distances and high variations in head pose and eye gaze angles …
Large camera-to-subject distances and high variations in head pose and eye gaze angles …
Every moment counts: Dense detailed labeling of actions in complex videos
Every moment counts in action recognition. A comprehensive understanding of human
activity in video requires labeling every frame according to the actions occurring, placing …
activity in video requires labeling every frame according to the actions occurring, placing …
Class semantics-based attention for action detection
Action localization networks are often structured as a feature encoder sub-network and a
localization sub-network, where the feature encoder learns to transform an input video to …
localization sub-network, where the feature encoder learns to transform an input video to …
Action recognition in realistic sports videos
The ability to analyze the actions which occur in a video is essential for automatic
understanding of sports. Action localization and recognition in videos are two main research …
understanding of sports. Action localization and recognition in videos are two main research …
[PDF][PDF] Detecting events and key actors in multi-person videos
Multi-person event recognition is a challenging task, often with many people active in the
scene but only a small subset contributing to an actual event. In this paper, we propose a …
scene but only a small subset contributing to an actual event. In this paper, we propose a …
Reinforcement learning for visual object detection
S Mathe, A Pirinen… - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
One of the most widely used strategies for visual object detection is based on exhaustive
spatial hypothesis search. While methods like sliding windows have been successful and …
spatial hypothesis search. While methods like sliding windows have been successful and …
Fast action proposals for human action detection and search
In this paper we target at generating generic action proposals in unconstrained videos. Each
action proposal corresponds to a temporal series of spatial bounding boxes, ie, a spatio …
action proposal corresponds to a temporal series of spatial bounding boxes, ie, a spatio …
Gaze-enabled egocentric video summarization via constrained submodular maximization
With the proliferation of wearable cameras, the number of videos of users documenting their
personal lives using such devices is rapidly increasing. Since such videos may span hours …
personal lives using such devices is rapidly increasing. Since such videos may span hours …