SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators
Emerging multi-model workloads with heavy models like recent large language models
significantly increased the compute and memory demands on hardware. To address such …
significantly increased the compute and memory demands on hardware. To address such …
HyDe: A Hybrid PCM/FeFET/SRAM Device-search for Optimizing Area and Energy-efficiencies in Analog IMC Platforms
Today, there are a plethora of In-Memory Computing (IMC) devices-SRAMs, PCMs &
FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC …
FeFETs, that emulate convolutions on crossbar-arrays with high throughput. Each IMC …
HISIM: Analytical Performance Modeling and Design Space Exploration of 2.5 D/3D Integration for AI Computing
Monolithic designs face significant fabrication cost and data movement challenges,
especially when executing complex and diverse AI models. Advanced 2.5 D/3D packaging …
especially when executing complex and diverse AI models. Advanced 2.5 D/3D packaging …
Exploiting 2.5 D/3D Heterogeneous Integration for AI Computing
The evolution of AI algorithms has not only revolutionized many application domains, but
also posed tremendous challenges on the hardware platform. Advanced packaging …
also posed tremendous challenges on the hardware platform. Advanced packaging …
Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception
We study the application of emerging chiplet-based Neural Processing Units to accelerate
vehicular AI perception workloads in constrained automotive settings. The motivation stems …
vehicular AI perception workloads in constrained automotive settings. The motivation stems …
3D In-Sensor Computing for Real-Time DVS Data Compression: 65nm Hardware-Algorithm Co-Design
Traditional IO links are insufficient to transport high volume of image sensor data, under
stringent power and latency constraints. To address this, we demonstrate a low latency, low …
stringent power and latency constraints. To address this, we demonstrate a low latency, low …
Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets
To address increasing compute demand from recent multi-model workloads with heavy
models like large language models, we propose to deploy heterogeneous chiplet-based …
models like large language models, we propose to deploy heterogeneous chiplet-based …
A 16nm Heterogeneous Accelerator for Energy-Efficient Sparse and Dense AI Computing
Artificial intelligence (AI) has evolved from dense Deep Neural Networks (DNNs) toward a
diverse set of models, such as sparse graph convolutional neural networks (GCNs). These …
diverse set of models, such as sparse graph convolutional neural networks (GCNs). These …
[HTML][HTML] End-to-End Benchmarking of Chiplet-Based In-Memory Computing
Abstract In-memory computing (IMC)-based hardware reduces latency and energy
consumption for compute-intensive machine learning (ML) applications. Several …
consumption for compute-intensive machine learning (ML) applications. Several …
Benchmarking Heterogeneous Integration with 2.5 D/3D Interconnect Modeling
Current monolithic designs face significant challenges in terms of silicon area, fabrication
cost, and data movement especially when dealing with increasingly complex and diverse AI …
cost, and data movement especially when dealing with increasingly complex and diverse AI …