Video description: A survey of methods, datasets, and evaluation metrics

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …

Host-based intrusion detection system with system calls: Review and future trends

M Liu, Z Xue, X Xu, C Zhong, J Chen - ACM computing surveys (CSUR), 2018 - dl.acm.org
In a contemporary data center, Linux applications often generate a large quantity of real-time
system call traces, which are not suitable for traditional host-based intrusion detection …

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

A Miech, D Zhukov, JB Alayrac… - Proceedings of the …, 2019 - openaccess.thecvf.com
Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

Coot: Cooperative hierarchical transformer for video-text representation learning

S Ging, M Zolfaghari, H Pirsiavash… - Advances in neural …, 2020 - proceedings.neurips.cc
Many real-world video-text tasks involve different levels of granularity, such as frames and
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

Video captioning with attention-based LSTM and semantic consistency

L Gao, Z Guo, H Zhang, X Xu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …

Video question answering via gradually refined attention over appearance and motion

D Xu, Z Zhao, J **ao, F Wu, H Zhang, X He… - Proceedings of the 25th …, 2017 - dl.acm.org
Recently image question answering (ImageQA) has gained lots of attention in the research
community. However, as its natural extension, video question answering (VideoQA) is less …