- Academic Search

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

保存引用被引用次数：816 相关文章所有 10 个版本

[Free GPT-4]

[PDF] arxiv.org

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

保存引用被引用次数：71 相关文章所有 3 个版本

[Free GPT-4]

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

保存引用被引用次数：569 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

保存引用被引用次数：127 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] thecvf.com

Swinbert: End-to-end transformers with sparse attention for video captioning

K Lin, L Li, CC Lin, F Ahmed, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com

The canonical approach to video captioning dictates a caption generation model to learn
from offline-extracted dense video features. These feature extractors usually operate on …

保存引用被引用次数：298 相关文章所有 5 个版本 HTML 版

Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning

H Luo, L Ji, M Zhong, Y Chen, W Lei, N Duan, T Li - Neurocomputing, 2022 - Elsevier

Video clip retrieval and captioning tasks play an essential role in multimodal research and
are the fundamental research problem for multimodal understanding and generation. The …

保存引用被引用次数：565 相关文章所有 5 个版本

[Free GPT-4]

[PDF] arxiv.org

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

保存引用被引用次数：3292 相关文章所有 12 个版本

[Free GPT-4]

[PDF] springer.com

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer

The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

保存引用被引用次数：355 相关文章所有 7 个版本

Video pivoting unsupervised multi-modal machine translation

M Li, PY Huang, X Chang, J Hu, Y Yang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The main challenge in the field of unsupervised machine translation (UMT) is to associate
source-target sentences in the latent space. As people who speak different languages share …

保存引用被引用次数：122 相关文章所有 7 个版本

[Free GPT-4]

[PDF] thecvf.com

Knowledge distillation via the target-aware transformer

S Lin, H **e, B Wang, K Yu, X Chang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Knowledge distillation becomes a de facto standard to improve the performance of
small neural networks. Most of the previous works propose to regress the representational …

保存引用被引用次数：131 相关文章所有 6 个版本 HTML 版

引用

高级搜索

已保存到“我的图书馆”

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

Git: A generative image-to-text transformer for vision and language

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

Swinbert: End-to-end transformers with sparse attention for video captioning

Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning

Knowledge distillation: A survey

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Video pivoting unsupervised multi-modal machine translation

Knowledge distillation via the target-aware transformer