SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Emerging multi-model workloads with heavy models like recent large language models
significantly increased the compute and memory demands on hardware. To address such …
significantly increased the compute and memory demands on hardware. To address such …
Tandem processor: Grappling with emerging operators in neural networks
With the ever increasing prevalence of neural networks and the upheaval from the language
models, it is time to rethink neural acceleration. Up to this point, the broader research …
models, it is time to rethink neural acceleration. Up to this point, the broader research …
Intel accelerators ecosystem: An soc-oriented perspective: Industry product
A growing demand for hyperscale services has compelled hyperscalers to deploy more
compute resources at an unprecedented pace, further accelerated by the demise of …
compute resources at an unprecedented pace, further accelerated by the demise of …
MAD-Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
Training and deploying large-scale machine learning models is time-consuming, requires
significant distributed computing infrastructures, and incurs high operational costs. Our …
significant distributed computing infrastructures, and incurs high operational costs. Our …
Data motion acceleration: Chaining cross-domain multi accelerators
There has been an arms race for devising accelerators for deep learning in recent years.
However, real-world applications are not only neural networks but often span across …
However, real-world applications are not only neural networks but often span across …
MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization
Quantization of foundational models (FMs) is significantly more challenging than traditional
DNNs due to the emergence of large magnitude features called outliers. Existing outlier …
DNNs due to the emergence of large magnitude features called outliers. Existing outlier …
Forward Learning of Large Language Models by Consumer Devices
Large Language Models achieve state of art performances on a broad variety of Natural
Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is …
Language Processing tasks. In the pervasive IoT era, their deployment on edge devices is …
Understanding Performance Implications of LLM Inference on CPUs
The remarkable performance of LLMs has led to their application in a wide range of fields,
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …
with data centers utilizing expensive accelerators such as GPUs and TPUs to support LLM …
Meta's Hyperscale Infrastructure: Overview and Insights
C Tang - Communications of the ACM, 2025 - dl.acm.org
Meta’s Hyperscale Infrastructure: Overview and Insights Page 1 research DOI: 10.1145/3701296
BY CHUNQIANG TANG Meta’s Hyperscale Infrastructure: Overview and Insights A look at Meta's …
BY CHUNQIANG TANG Meta’s Hyperscale Infrastructure: Overview and Insights A look at Meta's …
XEM: Tensor accelerator for AB21 supercomputing artificial intelligence processor
W Jeon, MY Lee, JH Lee, CG Lyuh - ETRI Journal, 2024 - Wiley Online Library
As computing systems become increasingly larger, high‐performance computing (HPC) is
gaining importance. In particular, as hyperscale artificial intelligence (AI) applications, such …
gaining importance. In particular, as hyperscale artificial intelligence (AI) applications, such …