Google 학술 검색

N Aafaq, A Mian, W Liu, SZ Gilani, M Shah - ACM Computing Surveys …, 2019 - dl.acm.org

Video description is the automatic generation of natural language sentences that describe
the contents of a given video. It has applications in human-robot interaction, hel** the …

저장 인용 257회 인용 관련 학술자료 전체 10개의 버전

Host-based intrusion detection system with system calls: Review and future trends

M Liu, Z Xue, X Xu, C Zhong, J Chen - ACM computing surveys (CSUR), 2018 - dl.acm.org

In a contemporary data center, Linux applications often generate a large quantity of real-time
system call traces, which are not suitable for traditional host-based intrusion detection …

저장 인용 210회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] thecvf.com

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

A Miech, D Zhukov, JB Alayrac… - Proceedings of the …, 2019 - openaccess.thecvf.com

Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …

[Free GPT-4]

[PDF] thecvf.com

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com

Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …

저장 인용 369회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] thecvf.com

Videos as space-time region graphs

X Wang, A Gupta - Proceedings of the European …, 2018 - openaccess.thecvf.com

How do humans recognize the action" opening a book"? We argue that there are two
important cues: modeling temporal shape dynamics and modeling functional relationships …

[Free GPT-4]

[PDF] neurips.cc

Coot: Cooperative hierarchical transformer for video-text representation learning

S Ging, M Zolfaghari, H Pirsiavash… - Advances in neural …, 2020 - proceedings.neurips.cc

Many real-world video-text tasks involve different levels of granularity, such as frames and
words, clip and sentences or videos and paragraphs, each with distinct semantics. In this …

[Free GPT-4]

[PDF] google.com

STAT: Spatial-temporal attention mechanism for video captioning

C Yan, Y Tu, X Wang, Y Zhang, X Hao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org

Video captioning refers to automatic generate natural language sentences, which
summarize the video contents. Inspired by the visual attention mechanism of human beings …

저장 인용 407회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] thecvf.com

Hierarchical conditional relation networks for video question answering

TM Le, V Le, S Venkatesh… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill
dynamic visual artifacts and distant relations and to associate them with linguistic concepts …

Video captioning with attention-based LSTM and semantic consistency

L Gao, Z Guo, H Zhang, X Xu… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org

Recent progress in using long short-term memory (LSTM) for image captioning has
motivated the exploration of their applications for video captioning. By taking a video as a …

저장 인용 689회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] ustc.edu.cn

Video question answering via gradually refined attention over appearance and motion

D Xu, Z Zhao, J **ao, F Wu, H Zhang, X He… - Proceedings of the 25th …, 2017 - dl.acm.org

Recently image question answering (ImageQA) has gained lots of attention in the research
community. However, as its natural extension, video question answering (VideoQA) is less …

저장 인용 629회 인용 관련 학술자료 전체 3개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Hierarchical recurrent neural encoder for video representation with application to captioning

Video description: A survey of methods, datasets, and evaluation metrics

Host-based intrusion detection system with system calls: Review and future trends

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

Object relational graph with teacher-recommended learning for video captioning

Videos as space-time region graphs

Coot: Cooperative hierarchical transformer for video-text representation learning

STAT: Spatial-temporal attention mechanism for video captioning

Hierarchical conditional relation networks for video question answering

Video captioning with attention-based LSTM and semantic consistency

Video question answering via gradually refined attention over appearance and motion