Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

Memorization without overfitting: Analyzing the training dynamics of large language models

K Tirumala, A Markosyan… - Advances in …, 2022 - proceedings.neurips.cc
Despite their wide adoption, the underlying training and memorization dynamics of very
large language models is not well understood. We empirically study exact memorization in …

Scaling laws for generative mixed-modal language models

A Aghajanyan, L Yu, A Conneau… - International …, 2023 - proceedings.mlr.press
Generative language models define distributions over sequences of tokens that can
represent essentially any combination of data modalities (eg, any permutation of image …

Scaling up models and data with t5x and seqio

A Roberts, HW Chung, G Mishra, A Levskaya… - Journal of Machine …, 2023 - jmlr.org
Scaling up training datasets and model parameters have benefited neural network-based
language models, but also present challenges like distributed compute, input data …

Cm3: A causal masked multimodal model of the internet

A Aghajanyan, B Huang, C Ross, V Karpukhin… - arxiv preprint arxiv …, 2022 - arxiv.org
We introduce CM3, a family of causally masked generative models trained over a large
corpus of structured multi-modal documents that can contain both text and image tokens …

Colossal-ai: A unified deep learning system for large-scale parallel training

S Li, H Liu, Z Bian, J Fang, H Huang, Y Liu… - Proceedings of the …, 2023 - dl.acm.org
The success of Transformer models has pushed the deep learning model scale to billions of
parameters, but the memory limitation of a single GPU has led to an urgent need for training …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Galvatron: Efficient transformer training over multiple gpus using automatic parallelism

X Miao, Y Wang, Y Jiang, C Shi, X Nie, H Zhang… - arxiv preprint arxiv …, 2022 - arxiv.org
Transformer models have achieved state-of-the-art performance on various domains of
applications and gradually becomes the foundations of the advanced large deep learning …