Video description: A survey of methods, datasets, and evaluation metrics

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …

A review of deep learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

Video captioning using global-local representation

L Yan, S Ma, Q Wang, Y Chen, X Zhang… - … on Circuits and …, 2022 - ieeexplore.ieee.org
Video captioning is a challenging task as it needs to accurately transform visual
understanding into natural language description. To date, state-of-the-art methods …

Reconstruction network for video captioning

B Wang, L Ma, W Zhang, W Liu - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com
In this paper, the problem of describing visual contents of a video sequence with natural
language is addressed. Unlike previous video captioning work mainly exploiting the cues of …

Memory-attended recurrent network for video captioning

W Pei, J Zhang, X Wang, L Ke… - Proceedings of the …, 2019 - openaccess.thecvf.com
Typical techniques for video captioning follow the encoder-decoder framework, which can
only focus on one source video being processed. A potential disadvantage of such design is …

Syntax-aware action targeting for video captioning

Q Zheng, C Wang, D Tao - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
Existing methods on video captioning have made great efforts to identify objects/instances in
videos, but few of them emphasize the prediction of action. As a result, the learned models …

Controllable video captioning with pos sequence guidance based on gated fusion network

B Wang, L Ma, W Zhang, W Jiang… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper, we propose to guide the video caption generation with Part-of-Speech (POS)
information, based on a gated fusion of multiple representations of input videos. We …

Video captioning via hierarchical reinforcement learning

X Wang, W Chen, J Wu, YF Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Video captioning is the task of automatically generating a textual description of the actions in
a video. Although previous work (eg sequence-to-sequence model) has shown promising …

Trends in integration of vision and language research: A survey of tasks, datasets, and methods

A Mogadala, M Kalimuthu, D Klakow - Journal of Artificial Intelligence …, 2021 - jair.org
Abstract Interest in Artificial Intelligence (AI) and its applications has seen unprecedented
growth in the last few years. This success can be partly attributed to the advancements made …