Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org
Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Swinbert: End-to-end transformers with sparse attention for video captioning

K Lin, L Li, CC Lin, F Ahmed, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com
The canonical approach to video captioning dictates a caption generation model to learn
from offline-extracted dense video features. These feature extractors usually operate on …

Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning

H Luo, L Ji, M Zhong, Y Chen, W Lei, N Duan, T Li - Neurocomputing, 2022 - Elsevier
Video clip retrieval and captioning tasks play an essential role in multimodal research and
are the fundamental research problem for multimodal understanding and generation. The …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

K Bayoudh, R Knani, F Hamdaoui, A Mtibaa - The Visual Computer, 2022 - Springer
The research progress in multimodal learning has grown rapidly over the last decade in
several areas, especially in computer vision. The growing potential of multimodal data …

Video pivoting unsupervised multi-modal machine translation

M Li, PY Huang, X Chang, J Hu, Y Yang… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
The main challenge in the field of unsupervised machine translation (UMT) is to associate
source-target sentences in the latent space. As people who speak different languages share …

Knowledge distillation via the target-aware transformer

S Lin, H **e, B Wang, K Yu, X Chang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Knowledge distillation becomes a de facto standard to improve the performance of
small neural networks. Most of the previous works propose to regress the representational …