Large language model inference acceleration: A comprehensive hardware perspective
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …
fields, from natural language understanding to text generation. Compared to non-generative …
Memory-Centric Computing: Recent Advances in Processing-in-DRAM
Memory-centric computing aims to enable computation capability in and near all places
where data is generated and stored. As such, it can greatly reduce the large negative …
where data is generated and stored. As such, it can greatly reduce the large negative …
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching
Large language models (LLMs) have emerged due to their capability to generate high-
quality content across diverse contexts. To reduce their explosively increasing demands for …
quality content across diverse contexts. To reduce their explosively increasing demands for …
PyGim: An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures
Graph Neural Networks (GNNs) are emerging models to analyze graph-structure data. GNN
execution involves both compute-intensive and memory-intensive kernels. The latter kernels …
execution involves both compute-intensive and memory-intensive kernels. The latter kernels …
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
The rapid development of large language models (LLMs) has significantly transformed the
field of artificial intelligence, demonstrating remarkable capabilities in natural language …
field of artificial intelligence, demonstrating remarkable capabilities in natural language …
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Recently, there has been an extensive research effort in building efficient large language
model (LLM) inference serving systems. These efforts not only include innovations in the …
model (LLM) inference serving systems. These efforts not only include innovations in the …
INF^ 2: High-Throughput Generative Inference of Large Language Models using Near-Storage Processing
The growing memory and computational demands of large language models (LLMs) for
generative inference present significant challenges for practical deployment. One promising …
generative inference present significant challenges for practical deployment. One promising …
PAPI: Exploiting Dynamic Parallelism in Large Language Model Decoding with a Processing-In-Memory-Enabled Computing System
Large language models (LLMs) are widely used for natural language understanding and
text generation. An LLM model relies on a time-consuming step called LLM decoding to …
text generation. An LLM model relies on a time-consuming step called LLM decoding to …
PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference
Large Language Model (LLM) inference uses an autoregressive manner to generate one
token at a time, which exhibits notably lower operational intensity compared to earlier …
token at a time, which exhibits notably lower operational intensity compared to earlier …
GenAI at the Edge: Comprehensive Survey on Empowering Edge Devices
Generative Artificial Intelligence (GenAI) applies models and algorithms such as Large
Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a …
Language Model (LLM) and Foundation Model (FM) to generate new data. GenAI, as a …