Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition

Y Wang, R Huang, S Song… - Advances in neural …, 2021 - proceedings.neurips.cc
Abstract Vision Transformers (ViT) have achieved remarkable success in large-scale image
recognition. They split every 2D image into a fixed number of patches, each of which is …

Revisiting weakly supervised pre-training of visual perception models

M Singh, L Gustafson, A Adcock… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract Model pre-training is a cornerstone of modern visual recognition systems. Although
fully supervised pre-training on datasets like ImageNet is still the de-facto standard, recent …

Multi-Scale MLP-Mixer for image classification

H Zhang, ZX Dong, B Li, S He - Knowledge-Based Systems, 2022 - Elsevier
MLP-Mixer is a vision architecture that solely relies on multilayer perceptrons (MLPs), which
despite their simple architecture, they achieve a slightly inferior accuracy to the state-of-the …

Better together: Jointly optimizing {ML} collective scheduling and execution planning using {SYNDICATE}

K Mahajan, CH Chu, S Sridharan, A Akella - 20th USENIX Symposium …, 2023 - usenix.org
Emerging ML training deployments are trending towards larger models, and hybrid-parallel
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …

Method cards for prescriptive machine-learning transparency

D Adkins, B Alsallakh, A Cheema… - Proceedings of the 1st …, 2022 - dl.acm.org
Specialized documentation techniques have been developed to communicate key facts
about machine-learning (ML) systems and the datasets and models they rely on …

Prescriptive and descriptive approaches to machine-learning transparency

D Adkins, B Alsallakh, A Cheema… - CHI conference on …, 2022 - dl.acm.org
Specialized documentation techniques have been developed to communicate key facts
about machine-learning (ML) systems and the datasets and models they rely on …

[PDF][PDF] When Large Kernel Meets Vision Transformer: A Solution for SnakeCLEF & FungiCLEF.

Y Shen, X Sun, Z Zhu - CLEF (Working Notes), 2022 - ceur-ws.org
LifeCLEF 2022 is an evaluation campaign that is being organized as part of the CLEF
initiative labs. This paper record solutions of two competitions in LifeCLEF 2022, ie …

Auto-X3D: Ultra-efficient video understanding via finer-grained neural architecture search

Y Jiang, X Gong, J Wu, H Shi… - Proceedings of the …, 2022 - openaccess.thecvf.com
Efficient video architecture is the key to the deployment of video action recognition systems
on devices with limited computing capabilities. Unfortunately, existing video architectures …