- Academic Search

S Vandenhende, S Georgoulis… - IEEE transactions on …, 2021 - ieeexplore.ieee.org

With the advent of deep learning, many dense prediction tasks, ie, tasks that produce pixel-
level predictions, have seen significant performance improvements. The typical approach is …

Simpan Kutip Dirujuk 837 kali Artikel terkait 11 versi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Tokenlearner: Adaptive space-time tokenization for videos

M Ryoo, AJ Piergiovanni, A Arnab… - Advances in neural …, 2021 - proceedings.neurips.cc

In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …

Simpan Kutip Dirujuk 170 kali Artikel terkait 9 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Movienet: A holistic dataset for movie understanding

Q Huang, Y **ong, A Rao, J Wang, D Lin - Computer Vision–ECCV 2020 …, 2020 - Springer

Recent years have seen remarkable advances in visual understanding. However, how to
understand a story-based long video with artistic styles, eg movie, remains challenging. In …

Simpan Kutip Dirujuk 266 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tokenlearner: What can 8 learned tokens do for images and videos?

MS Ryoo, AJ Piergiovanni, A Arnab… - arxiv preprint arxiv …, 2021 - arxiv.org

In this paper, we introduce a novel visual representation learning which relies on a handful
of adaptively learned tokens, and which is applicable to both image and video …

Simpan Kutip Dirujuk 133 kali Artikel terkait 2 versi Versi HTML

Temporal cross-layer correlation mining for action recognition

L Zhu, H Fan, Y Luo, M Xu… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Neighboring frames are more correlated compared to frames from further temporal
distances. In this paper, we aim to explore the temporal correlations among neighboring …

Simpan Kutip Dirujuk 84 kali Artikel terkait 4 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Would mega-scale datasets further enhance spatiotemporal 3D CNNs?

H Kataoka, T Wakamiya, K Hara, Y Satoh - arxiv preprint arxiv …, 2020 - arxiv.org

How can we collect and use a video dataset to further improve spatiotemporal 3D
Convolutional Neural Networks (3D CNNs)? In order to positively answer this open question …

Simpan Kutip Dirujuk 128 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] kuleuven.be

[PDF][PDF] Revisiting multi-task learning in the deep learning era

S Vandenhende, S Georgoulis… - arxiv preprint arxiv …, 2020 - homes.esat.kuleuven.be

Despite the recent progress in deep learning, most approaches still go for a silo-like
solution, focusing on learning each task in isolation: training a separate neural network for …

Simpan Kutip Dirujuk 96 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Assemblenet: Searching for multi-stream neural connectivity in video architectures

MS Ryoo, AJ Piergiovanni, M Tan… - arxiv preprint arxiv …, 2019 - arxiv.org

Learning to represent videos is a very challenging task both algorithmically and
computationally. Standard video CNN architectures have been designed by directly …

Simpan Kutip Dirujuk 121 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos

J Wensel, H Ullah, A Munir - IEEE Access, 2023 - ieeexplore.ieee.org

Human activity recognition is an emerging and important area in computer vision which
seeks to determine the activity an individual or group of individuals are performing. The …

Simpan Kutip Dirujuk 50 kali Artikel terkait 6 versi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Learning interactions and relationships between movie characters

A Kukleva, M Tapaswi, I Laptev - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com

Interactions between people are often governed by their relationships. On the flip side,
social relationships are built upon several interactions. Two strangers are more likely to …

Simpan Kutip Dirujuk 72 kali Artikel terkait 11 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Holistic large scale video understanding

Multi-task learning for dense prediction tasks: A survey

Tokenlearner: Adaptive space-time tokenization for videos

Movienet: A holistic dataset for movie understanding

Tokenlearner: What can 8 learned tokens do for images and videos?

Temporal cross-layer correlation mining for action recognition

Would mega-scale datasets further enhance spatiotemporal 3D CNNs?

[PDF][PDF] Revisiting multi-task learning in the deep learning era

Assemblenet: Searching for multi-stream neural connectivity in video architectures

Vit-ret: Vision and recurrent transformer neural networks for human activity recognition in videos

Learning interactions and relationships between movie characters