Multi-task learning for dense prediction tasks: A survey

S Vandenhende, S Georgoulis… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
With the advent of deep learning, many dense prediction tasks, ie, tasks that produce pixel-
level predictions, have seen significant performance improvements. The typical approach is …

Tokenlearner: Adaptive space-time tokenization for videos

M Ryoo, AJ Piergiovanni, A Arnab… - Advances in neural …, 2021 - proceedings.neurips.cc
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …

Movienet: A holistic dataset for movie understanding

Q Huang, Y **ong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer
Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

Tokenlearner: What can 8 learned tokens do for images and videos?

MS Ryoo, AJ Piergiovanni, A Arnab… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …

Temporal cross-layer correlation mining for action recognition

L Zhu, H Fan, Y Luo, M Xu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Neighboring frames are more correlated compared to frames from further temporal
distances. In this paper, we aim to explore the temporal correlations among neighboring …

Would mega-scale datasets further enhance spatiotemporal 3D CNNs?

H Kataoka, T Wakamiya, K Hara, Y Satoh - arxiv preprint arxiv …, 2020 - arxiv.org
How can we collect and use a video dataset to further improve spatiotemporal 3D
Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question …

[PDF][PDF] Revisiting multi-task learning in the deep learning era

S Vandenhende, S Georgoulis… - arxiv preprint arxiv …, 2020 - homes.esat.kuleuven.be
Despite the recent progress in deep learning, most approaches still go for a silo-like
solution, focusing on learning each task in isolation: training a separate neural network for …

Assemblenet: Searching for multi-stream neural connectivity in video architectures

MS Ryoo, AJ Piergiovanni, M Tan… - arxiv preprint arxiv …, 2019 - arxiv.org
Learning to represent videos is a very challenging task both algorithmically and
computationally. Standard video CNN architectures have been designed by directly …

Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos

J Wensel, H Ullah, A Munir - IEEE Access, 2023 - ieeexplore.ieee.org
Human activity recognition is an emerging and important area in computer vision which
seeks to determine the activity an individual or group of individuals are performing. The …

Learning interactions and relationships between movie characters

A Kukleva, M Tapaswi, I Laptev - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Interactions between people are often governed by their relationships. On the flip side,
social relationships are built upon several interactions. Two strangers are more likely to …