Deep learning in natural language processing: A state-of-the-art survey
Deep learning raises interests of research community as their overwhelming successes in
information processing such specific tasks as video/speech recognition. In this paper, we …
information processing such specific tasks as video/speech recognition. In this paper, we …
Timeception for complex action recognition
This paper focuses on the temporal aspect for recognizing human activities in videos; an
important visual cue that has long been undervalued. We revisit the conventional definition …
important visual cue that has long been undervalued. We revisit the conventional definition …
Exploiting feature and class relationships in video categorization with regularized deep neural networks
In this paper, we study the challenging problem of categorizing videos according to high-
level semantics such as the existence of a particular human action or a complex event …
level semantics such as the existence of a particular human action or a complex event …
Video generation from text
Generating videos from text has proven to be a significant challenge for existing generative
models. We tackle this problem by training a conditional generative model to extract both …
models. We tackle this problem by training a conditional generative model to extract both …
Predicting visual features from text for image and video caption retrieval
This paper strives to find amidst a set of sentences the one best describing the content of a
given image or video. Different from existing works, which rely on a joint subspace for their …
given image or video. Different from existing works, which rely on a joint subspace for their …
Multi-shot temporal event localization: a benchmark
Current developments in temporal event or action localization usually target actions
captured by a single camera. However, extensive events or actions in the wild may be …
captured by a single camera. However, extensive events or actions in the wild may be …
Soccernet: A scalable dataset for action spotting in soccer videos
In this paper, we introduce SoccerNet, a benchmark for action spotting in soccer videos. The
dataset is composed of 500 complete soccer games from six main European leagues …
dataset is composed of 500 complete soccer games from six main European leagues …
W2vv++ fully deep learning for ad-hoc video search
Ad-hoc video search (AVS) is an important yet challenging problem in multimedia retrieval.
Different from previous concept-based methods, we propose a fully deep learning method …
Different from previous concept-based methods, we propose a fully deep learning method …
Hawkes processes for events in social media
This chapter provides an accessible introduction for point processes, and especially Hawkes
processes, for modeling discrete, inter-dependent events over continuous time. We start by …
processes, for modeling discrete, inter-dependent events over continuous time. We start by …
Omni-sourced webly-supervised learning for video recognition
We introduce OmniSource, a novel framework for leveraging web data to train video
recognition models. OmniSource overcomes the barriers between data formats, such as …
recognition models. OmniSource overcomes the barriers between data formats, such as …