Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Forge: Pre-training open foundation models for science

J Yin, S Dash, F Wang, M Shankar - Proceedings of the International …, 2023 - dl.acm.org
Large language models (LLMs) are poised to revolutionize the way we conduct scientific
research. However, both model complexity and pre-training cost are impeding effective …

[PDF][PDF] Mobile Foundation Model as Firmware

J Yuan, C Yang, D Cai, S Wang, X Yuan… - arxiv preprint arxiv …, 2023 - caidongqi.com
In today's landscape, smartphones have evolved into hubs for hosting a multitude of deep
learning models aimed at local execution. A key realization driving this work is the notable …

Not all gpus are created equal: characterizing variability in large-scale, accelerator-rich systems

P Sinha, A Guliani, R Jain, B Tran… - … Conference for High …, 2022 - ieeexplore.ieee.org
Scientists are increasingly exploring and utilizing the massive parallelism of general-
purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters …

Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms

Q Huang, PA Tsai, JS Emer… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The architectural design-space exploration (or DSE) process-whether manual or automated-
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …

Generative AI beyond LLMs: System implications of multi-modal generation

A Golden, S Hsia, F Sun, B Acun… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
As the development of large-scale Generative AI models evolve beyond text (1D) generation
to include image (2D) and video (3D) generation, processing spatial and temporal …

Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link

D Xu, Y Feng, K Shin, D Kim, H Jeon… - … Conference for High …, 2024 - ieeexplore.ieee.org
The deep learning models (DL) are becoming bigger, easily beyond the memory capacity of
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …

Optimstore: In-storage optimization of large scale dnns with on-die processing

J Kim, M Kang, Y Han, YG Kim… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Training deep neural network (DNN) models is a resource-intensive, iterative process. For
this reason, nowadays, complex optimizers like Adam are widely adopted as it increases the …

A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models

H Sharma, P Dhingra, JR Doppa, U Ogras… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …

Amped: An analytical model for performance in distributed training of transformers

D Moolchandani, J Kundu, F Ruelens… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Transformers are a class of machine learning models that have piqued high interest recently
due to a multitude of reasons. They can process multiple modalities efficiently and have …