A comprehensive survey of deep learning for image captioning
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …
recognizing the important objects, their attributes, and their relationships in an image. It also …
A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …
the computer vision. It has critical application in wide variety of tasks including gaming …
Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
Intentnet: Learning to predict intention from raw sensor data
In order to plan a safe maneuver, self-driving vehicles need to understand the intent of other
traffic participants. We define intent as a combination of discrete high level behaviors as well …
traffic participants. We define intent as a combination of discrete high level behaviors as well …
The language of actions: Recovering the syntax and semantics of goal-directed human activities
This paper describes a framework for modeling human activities as temporally structured
processes. Our approach is motivated by the inherently hierarchical nature of human …
processes. Our approach is motivated by the inherently hierarchical nature of human …
Review of eye tracking metrics involved in emotional and cognitive processes
Eye behaviour provides valuable information revealing one's higher cognitive functions and
state of affect. Although eye tracking is gaining ground in the research community, it is not …
state of affect. Although eye tracking is gaining ground in the research community, it is not …
In the eye of beholder: Joint learning of gaze and actions in first person video
We address the task of jointly determining what a person is doing and where they are
looking based on the analysis of video captured by a headworn camera. We propose a …
looking based on the analysis of video captured by a headworn camera. We propose a …
Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks
Recently, skeleton based action recognition gains more popularity due to cost-effective
depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches …
depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches …
Scaling egocentric vision: The epic-kitchens dataset
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction
with objects, their attention, and even intention. However, progress in this challenging …
with objects, their attention, and even intention. However, progress in this challenging …
The epic-kitchens dataset: Collection, challenges and baselines
Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest
egocentric video benchmark, offering a unique viewpoint on people's interaction with …
egocentric video benchmark, offering a unique viewpoint on people's interaction with …