- Academic Search

C Shorten, TM Khoshgoftaar, B Furht - Journal of big Data, 2021 - Springer

Abstract Natural Language Processing (NLP) is one of the most captivating applications of
Deep Learning. In this survey, we consider how the Data Augmentation training strategy can …

保存引用被引用数: 582 関連記事全 15 バージョン

[Free GPT-4]

[PDF] mdpi.com

A review on dropout regularization approaches for deep neural networks within the scholarly domain

I Salehin, DK Kang - Electronics, 2023 - mdpi.com

Dropout is one of the most popular regularization methods in the scholarly domain for
preventing a neural network model from overfitting in the training phase. Develo** an …

保存引用被引用数: 94 関連記事全 5 バージョンキャッシュ

[Free GPT-4]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

保存引用被引用数: 382 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Masked autoencoders as spatiotemporal learners

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022 - proceedings.neurips.cc

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …

保存引用被引用数: 557 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Vision gnn: An image is worth graph of nodes

K Han, Y Wang, J Guo, Y Tang… - Advances in neural …, 2022 - proceedings.neurips.cc

Network architecture plays a key role in the deep learning-based computer vision system.
The widely-used convolutional neural network and transformer treat the image as a grid or …

保存引用被引用数: 423 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

保存引用被引用数: 1147 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Internvideo: General video foundation models via generative and discriminative learning

Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao… - arxiv preprint arxiv …, 2022 - arxiv.org

The foundation models have recently shown excellent performance on a variety of
downstream tasks in computer vision. However, most existing vision foundation models …

保存引用被引用数: 328 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

保存引用被引用数: 855 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Davit: Dual attention vision transformers

M Ding, B **ao, N Codella, P Luo, J Wang… - European conference on …, 2022 - Springer

In this work, we introduce Dual Attention Vision Transformers (DaViT), a simple yet effective
vision transformer architecture that is able to capture global context while maintaining …

保存引用被引用数: 347 関連記事全 5 バージョン

[Free GPT-4]

[PDF] arxiv.org

Filip: Fine-grained interactive language-image pre-training

L Yao, R Huang, L Hou, G Lu, M Niu, H Xu… - arxiv preprint arxiv …, 2021 - arxiv.org

Unsupervised large-scale vision-language pre-training has shown promising advances on
various downstream tasks. Existing methods often model the cross-modal interaction either …

保存引用被引用数: 608 関連記事全 4 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Augment your batch: Improving generalization through instance repetition

Text data augmentation for deep learning

A review on dropout regularization approaches for deep neural networks within the scholarly domain

Videomae v2: Scaling video masked autoencoders with dual masking

Masked autoencoders as spatiotemporal learners

Vision gnn: An image is worth graph of nodes

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Internvideo: General video foundation models via generative and discriminative learning

Mvitv2: Improved multiscale vision transformers for classification and detection

Davit: Dual attention vision transformers

Filip: Fine-grained interactive language-image pre-training