Multi-task learning for dense prediction tasks: A survey
S Vandenhende, S Georgoulis… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
With the advent of deep learning, many dense prediction tasks, ie, tasks that produce pixel-
level predictions, have seen significant performance improvements. The typical approach is …
level predictions, have seen significant performance improvements. The typical approach is …
Tokenlearner: Adaptive space-time tokenization for videos
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …
of adaptively learned tokens, and which is applicable to both image and video …
Movienet: A holistic dataset for movie understanding
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
understand a story-based long video with artistic styles, eg movie, remains challenging. In …
Tokenlearner: What can 8 learned tokens do for images and videos?
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …
of adaptively learned tokens, and which is applicable to both image and video …
Temporal cross-layer correlation mining for action recognition
Neighboring frames are more correlated compared to frames from further temporal
distances. In this paper, we aim to explore the temporal correlations among neighboring …
distances. In this paper, we aim to explore the temporal correlations among neighboring …
Would mega-scale datasets further enhance spatiotemporal 3D CNNs?
How can we collect and use a video dataset to further improve spatiotemporal 3D
Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question …
Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question …
[PDF][PDF] Revisiting multi-task learning in the deep learning era
Despite the recent progress in deep learning, most approaches still go for a silo-like
solution, focusing on learning each task in isolation: training a separate neural network for …
solution, focusing on learning each task in isolation: training a separate neural network for …
Assemblenet: Searching for multi-stream neural connectivity in video architectures
Learning to represent videos is a very challenging task both algorithmically and
computationally. Standard video CNN architectures have been designed by directly …
computationally. Standard video CNN architectures have been designed by directly …
Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos
Human activity recognition is an emerging and important area in computer vision which
seeks to determine the activity an individual or group of individuals are performing. The …
seeks to determine the activity an individual or group of individuals are performing. The …
Learning interactions and relationships between movie characters
Interactions between people are often governed by their relationships. On the flip side,
social relationships are built upon several interactions. Two strangers are more likely to …
social relationships are built upon several interactions. Two strangers are more likely to …