Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Forge: Pre-training open foundation models for science
Large language models (LLMs) are poised to revolutionize the way we conduct scientific
research. However, both model complexity and pre-training cost are impeding effective …
research. However, both model complexity and pre-training cost are impeding effective …
[PDF][PDF] Mobile Foundation Model as Firmware
In today's landscape, smartphones have evolved into hubs for hosting a multitude of deep
learning models aimed at local execution. A key realization driving this work is the notable …
learning models aimed at local execution. A key realization driving this work is the notable …
Not all gpus are created equal: characterizing variability in large-scale, accelerator-rich systems
Scientists are increasingly exploring and utilizing the massive parallelism of general-
purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters …
purpose accelerators such as GPUs for scientific breakthroughs. As a result, datacenters …
Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms
The architectural design-space exploration (or DSE) process-whether manual or automated-
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …
Generative AI beyond LLMs: System implications of multi-modal generation
As the development of large-scale Generative AI models evolve beyond text (1D) generation
to include image (2D) and video (3D) generation, processing spatial and temporal …
to include image (2D) and video (3D) generation, processing spatial and temporal …
Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link
The deep learning models (DL) are becoming bigger, easily beyond the memory capacity of
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …
a single accelerator. The recent progress in large DL training utilizes CPU memory as an …
Optimstore: In-storage optimization of large scale dnns with on-die processing
Training deep neural network (DNN) models is a resource-intensive, iterative process. For
this reason, nowadays, complex optimizers like Adam are widely adopted as it increases the …
this reason, nowadays, complex optimizers like Adam are widely adopted as it increases the …
A Heterogeneous Chiplet Architecture for Accelerating End-to-End Transformer Models
Transformers have revolutionized deep learning and generative modeling, enabling
unprecedented advancements in natural language processing tasks. However, the size of …
unprecedented advancements in natural language processing tasks. However, the size of …
Amped: An analytical model for performance in distributed training of transformers
Transformers are a class of machine learning models that have piqued high interest recently
due to a multitude of reasons. They can process multiple modalities efficiently and have …
due to a multitude of reasons. They can process multiple modalities efficiently and have …