Video description: A survey of methods, datasets, and evaluation metrics
Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …
the contents of a given video. It has applications in human-robot interaction, hel** the …
Host-based intrusion detection system with system calls: Review and future trends
M Liu, Z Xue, X Xu, C Zhong, J Chen - ACM computing surveys (CSUR), 2018 - dl.acm.org
In a contemporary data center, Linux applications often generate a large quantity of real-time
system call traces, which are not suitable for traditional host-based intrusion detection …
system call traces, which are not suitable for traditional host-based intrusion detection …
Howto100m: Learning a text-video embedding by watching hundred million narrated video clips
Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …
provided captions. However, such datasets are expensive and time consuming to create and …
Object relational graph with teacher-recommended learning for video captioning
Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …
video captioning task. Existing models lack adequate visual representation due to the …
Videos as space-time region graphs
How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …
important cues: modeling temporal shape dynamics and modeling functional relationships …
Coot: Cooperative hierarchical transformer for video-text representation learning
Many real-world video-text tasks involve different levels of granularity, such as frames and
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …
STAT: Spatial-temporal attention mechanism for video captioning
Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …
summarize the video contents. Inspired by the visual attention mechanism of human beings …
Hierarchical conditional relation networks for video question answering
Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …
Video captioning with attention-based LSTM and semantic consistency
Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …
motivated the exploration of their applications for video captioning. By taking a video as a …
Video question answering via gradually refined attention over appearance and motion
Recently image question answering (ImageQA) has gained lots of attention in the research
community. However, as its natural extension, video question answering (VideoQA) is less …
community. However, as its natural extension, video question answering (VideoQA) is less …