[HTML][HTML] A state-of-the-art survey on deep learning theory and architectures

MZ Alom, TM Taha, C Yakopcic, S Westberg, P Sidike… - electronics, 2019 - mdpi.com
In recent years, deep learning has garnered tremendous success in a variety of application
domains. This new field of machine learning has been growing rapidly and has been …

The history began from alexnet: A comprehensive survey on deep learning approaches

MZ Alom, TM Taha, C Yakopcic, S Westberg… - arxiv preprint arxiv …, 2018 - arxiv.org
Deep learning has demonstrated tremendous success in variety of application domains in
the past few years. This new field of machine learning has been growing rapidly and applied …

Clip4clip: An empirical study of clip for end to end video clip retrieval and captioning

H Luo, L Ji, M Zhong, Y Chen, W Lei, N Duan, T Li - Neurocomputing, 2022 - Elsevier
Video clip retrieval and captioning tasks play an essential role in multimodal research and
are the fundamental research problem for multimodal understanding and generation. The …

Ai choreographer: Music conditioned 3d dance generation with aist++

R Li, S Yang, DA Ross… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We present AIST++, a new multi-modal dataset of 3D dance motion and music, along with
FACT, a Full-Attention Cross-modal Transformer network for generating 3D dance motion …

Exploring and distilling posterior and prior knowledge for radiology report generation

F Liu, X Wu, S Ge, W Fan, Y Zou - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Automatically generating radiology reports can improve current clinical practice in diagnostic
radiology. On one hand, it can relieve radiologists from the heavy burden of report writing; …

Clip4clip: An empirical study of clip for end to end video clip retrieval

H Luo, L Ji, M Zhong, Y Chen, W Lei, N Duan… - arxiv preprint arxiv …, 2021 - arxiv.org
Video-text retrieval plays an essential role in multi-modal research and has been widely
used in many real-world web applications. The CLIP (Contrastive Language-Image Pre …

End-to-end dense video captioning with parallel decoding

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

A Miech, D Zhukov, JB Alayrac… - Proceedings of the …, 2019 - openaccess.thecvf.com
Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …

Centerclip: Token clustering for efficient text-video retrieval

S Zhao, L Zhu, X Wang, Y Yang - … of the 45th International ACM SIGIR …, 2022 - dl.acm.org
Recently, large-scale pre-training methods like CLIP have made great progress in multi-
modal research such as text-video retrieval. In CLIP, transformers are vital for modeling …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …