SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators

M Odema, L Chen, H Kwon… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Emerging multi-model workloads with heavy models like recent large language models
significantly increased the compute and memory demands on hardware. To address such …

Tandem processor: Grappling with emerging operators in neural networks

S Ghodrati, S Kinzer, H Xu, R Mahapatra… - Proceedings of the 29th …, 2024 - dl.acm.org
With the ever increasing prevalence of neural networks and the upheaval from the language
models, it is time to rethink neural acceleration. Up to this point, the broader research …

Intel accelerators ecosystem: An soc-oriented perspective: Industry product

Y Yuan, R Wang, N Ranganathan… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
A growing demand for hyperscale services has compelled hyperscalers to deploy more
compute resources at an unprecedented pace, further accelerated by the demise of …

MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

S Hsia, A Golden, B Acun, N Ardalani… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Training and deploying large-scale machine learning models is time-consuming, requires
significant distributed computing infrastructures, and incurs high operational costs. Our …

Data motion acceleration: Chaining cross-domain multi accelerators

ST Wang, H Xu, A Mamandipoor… - … Symposium on High …, 2024 - ieeexplore.ieee.org
There has been an arms race for devising accelerators for deep learning in recent years.
However, real-world applications are not only neural networks but often span across …

MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization

A Ramachandran, S Kundu, T Krishna - arxiv preprint arxiv:2411.05282, 2024 - arxiv.org
Quantization of foundational models (FMs) is significantly more challenging than traditional
DNNs due to the emergence of large magnitude features called outliers. Existing outlier …

Forward Learning of Large Language Models by Consumer Devices

DP Pau, FM Aymone - Electronics, 2024 - mdpi.com
Large Language Models achieve state of art performances on a broad variety of Natural
Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is …

Understanding Performance Implications of LLM Inference on CPUs

S Na, G Jeong, BH Ahn, J Young… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
The remarkable performance of LLMs has led to their application in a wide range of fields,
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …

Meta's Hyperscale Infrastructure: Overview and Insights

C Tang - Communications of the ACM, 2025 - dl.acm.org
Meta’s Hyperscale Infrastructure: Overview and Insights Page 1 research DOI: 10.1145/3701296
BY CHUNQIANG TANG Meta’s Hyperscale Infrastructure: Overview and Insights A look at Meta's …

XEM: Tensor accelerator for AB21 supercomputing artificial intelligence processor

W Jeon, MY Lee, JH Lee, CG Lyuh - ETRI Journal, 2024 - Wiley Online Library
As computing systems become increasingly larger, high‐performance computing (HPC) is
gaining importance. In particular, as hyperscale artificial intelligence (AI) applications, such …