Video description: A survey of methods, datasets, and evaluation metrics

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …

Multiple instance learning: A survey of problem characteristics and applications

MA Carbonneau, V Cheplygina, E Granger, G Gagnon - Pattern recognition, 2018 - Elsevier
Multiple instance learning (MIL) is a form of weakly supervised learning where training
instances are arranged in sets, called bags, and a label is provided for the entire bag. This …

Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning

N Aafaq, N Akhtar, W Liu, SZ Gilani… - Proceedings of the …, 2019 - openaccess.thecvf.com
Automatic generation of video captions is a fundamental challenge in computer vision.
Recent techniques typically employ a combination of Convolutional Neural Networks …

Multilevel language and vision integration for text-to-clip retrieval

H Xu, K He, BA Plummer, L Sigal, S Sclaroff… - Proceedings of the …, 2019 - ojs.aaai.org
We address the problem of text-based activity retrieval in video. Given a sentence describing
an activity, our task is to retrieve matching clips from an untrimmed video. To capture the …

Describing video with attention-based bidirectional LSTM

Y Bin, Y Yang, F Shen, N **e… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Video captioning has been attracting broad research attention in the multimedia community.
However, most existing approaches heavily rely on static visual information or partially …

Video paragraph captioning using hierarchical recurrent neural networks

H Yu, J Wang, Z Huang, Y Yang… - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
We present an approach that exploits hierarchical Recurrent Neural Networks (RNNs) to
tackle the video captioning problem, ie, generating one or multiple sentences to describe a …

Video captioning by adversarial LSTM

Y Yang, J Zhou, J Ai, Y Bin, A Hanjalic… - … on Image Processing, 2018 - ieeexplore.ieee.org
In this paper, we propose a novel approach to video captioning based on adversarial
learning and long short-term memory (LSTM). With this solution concept, we aim at …

Weakly supervised dense event captioning in videos

X Duan, W Huang, C Gan, J Wang… - Advances in Neural …, 2018 - proceedings.neurips.cc
Dense event captioning aims to detect and describe all events of interest contained in a
video. Despite the advanced development in this area, existing methods tackle this task by …