A comprehensive survey of deep learning for image captioning

MDZ Hossain, F Sohel, MF Shiratuddin… - ACM Computing Surveys …, 2019 - dl.acm.org
Generating a description of an image is called image captioning. Image captioning requires
recognizing the important objects, their attributes, and their relationships in an image. It also …

A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions

SK Yadav, K Tiwari, HM Pandey, SA Akbar - Knowledge-Based Systems, 2021 - Elsevier
Human activity recognition (HAR) is one of the most important and challenging problems in
the computer vision. It has critical application in wide variety of tasks including gaming …

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100

D Damen, H Doughty, GM Farinella, A Furnari… - International Journal of …, 2022 - Springer
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …

Intentnet: Learning to predict intention from raw sensor data

S Casas, W Luo, R Urtasun - Conference on Robot Learning, 2018 - proceedings.mlr.press
In order to plan a safe maneuver, self-driving vehicles need to understand the intent of other
traffic participants. We define intent as a combination of discrete high level behaviors as well …

The language of actions: Recovering the syntax and semantics of goal-directed human activities

H Kuehne, A Arslan, T Serre - Proceedings of the IEEE …, 2014 - openaccess.thecvf.com
This paper describes a framework for modeling human activities as temporally structured
processes. Our approach is motivated by the inherently hierarchical nature of human …

Review of eye tracking metrics involved in emotional and cognitive processes

V Skaramagkas, G Giannakakis… - IEEE Reviews in …, 2021 - ieeexplore.ieee.org
Eye behaviour provides valuable information revealing one's higher cognitive functions and
state of affect. Although eye tracking is gaining ground in the research community, it is not …

In the eye of beholder: Joint learning of gaze and actions in first person video

Y Li, M Liu, JM Rehg - Proceedings of the European …, 2018 - openaccess.thecvf.com
We address the task of jointly determining what a person is doing and where they are
looking based on the analysis of video captured by a headworn camera. We propose a …

Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks

H Wang, L Wang - Proceedings of the IEEE conference on …, 2017 - openaccess.thecvf.com
Recently, skeleton based action recognition gains more popularity due to cost-effective
depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches …

Scaling egocentric vision: The epic-kitchens dataset

D Damen, H Doughty, GM Farinella… - Proceedings of the …, 2018 - openaccess.thecvf.com
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction
with objects, their attention, and even intention. However, progress in this challenging …

The epic-kitchens dataset: Collection, challenges and baselines

D Damen, H Doughty, GM Farinella… - … on Pattern Analysis …, 2020 - ieeexplore.ieee.org
Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest
egocentric video benchmark, offering a unique viewpoint on people's interaction with …