[HTML][HTML] Pre-trained language models and their applications

H Wang, J Li, H Wu, E Hovy, Y Sun - Engineering, 2023 - Elsevier
Pre-trained language models have achieved striking success in natural language
processing (NLP), leading to a paradigm shift from supervised learning to pre-training …

Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale

T Dettmers, M Lewis, Y Belkada… - Advances in neural …, 2022 - proceedings.neurips.cc
Large language models have been widely adopted but require significant GPU memory for
inference. We develop a procedure for Int8 matrix multiplication for feed-forward and …

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X **… - … USENIX Symposium on …, 2023 - usenix.org
Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …

Palm: Scaling language modeling with pathways

A Chowdhery, S Narang, J Devlin, M Bosma… - Journal of Machine …, 2023 - jmlr.org
Large language models have been shown to achieve remarkable performance across a
variety of natural language tasks using few-shot learning, which drastically reduces the …

Orca: A distributed serving system for {Transformer-Based} generative models

GI Yu, JS Jeong, GW Kim, S Kim, BG Chun - 16th USENIX Symposium …, 2022 - usenix.org
Large-scale Transformer-based models trained for generation tasks (eg, GPT-3) have
recently attracted huge interest, emphasizing the need for system support for serving models …

Reducing activation recomputation in large transformer models

VA Korthikanti, J Casper, S Lym… - Proceedings of …, 2023 - proceedings.mlsys.org
Training large transformer models is one of the most important computational challenges of
modern AI. In this paper, we show how to significantly accelerate the training of large …

Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …