Llmcompass: Enabling efficient hardware design for large language model inference

H Zhang, A Ning, RB Prabhakar… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

Demystifying platform requirements for diverse llm inference use cases

A Bambhaniya, R Raj, G Jeong, S Kundu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable performance across a wide range
of applications, often outperforming human experts. However, deploying these parameter …

Wafer-scale computing: Advancements, challenges, and future perspectives [feature]

Y Hu, X Lin, H Wang, Z He, X Yu… - IEEE Circuits and …, 2024 - ieeexplore.ieee.org
Nowadays, artificial intelligence (AI) technology with large models plays an increasingly
important role in both academia and industry. It also brings a rapidly increasing demand for …

Scaling down to scale up: A cost-benefit analysis of replacing OpenAI's LLM with open source SLMs in production

C Irugalbandara, A Mahendra… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
Many companies use large language models (LLMs) offered as a service, like OpenAl's GPT-
4, to create AI-enabled product experiences. Along with the benefits of ease-of-use and …

vtrain: A simulation framework for evaluating cost-effective and compute-optimal large language model training

J Bang, Y Choi, M Kim, Y Kim… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
As large language models (LLMs) become widespread in various application domains, a
critical challenge the AI community is facing is how to train these large AI models in a cost …

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

S Hsia, A Golden, B Acun, N Ardalani… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Training and deploying large-scale machine learning models is time-consuming, requires
significant distributed computing infrastructures, and incurs high operational costs. Our …

Towards cognitive ai systems: Workload and characterization of neuro-symbolic ai

Z Wan, CK Liu, H Yang, R Raj, C Li… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
The remarkable advancements in artificial intel-ligence (AI), primarily driven by deep neural
networks, are facing challenges surrounding unsustainable computational tra-jectories …

Deepflow: A cross-stack pathfinding framework for distributed ai systems

N Ardalani, S Pal, P Gupta - ACM Transactions on Design Automation of …, 2024 - dl.acm.org
Over the past decade, machine learning model complexity has grown at an extraordinary
rate, as has the scale of the systems training such large models. However, there is an …

Performance modeling and workload analysis of distributed large language model training and inference

J Kundu, W Guo, A BanaGozar… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Aligning future system design with the ever-increasing compute needs of large language
models (LLMs) is undoubtedly an important problem in today's world. Here, we propose a …

Chiplet-Gym: Optimizing Chiplet-based AI Accelerator Design with Reinforcement Learning

K Mishty, M Sadi - IEEE Transactions on Computers, 2024 - ieeexplore.ieee.org
Modern Artificial Intelligence (AI) workloads demand computing systems with large silicon
area to sustain throughput and competitive performance. However, prohibitive …