LoopTree: Exploring the Fused-layer Dataflow Accelerator Design Space

M Gilbert, YN Wu, JS Emer… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Latency and energy consumption are key metrics in the performance of deep neural network
(DNN) accelerators. A significant factor contributing to latency and energy is data transfers …

Energy Cost Modelling for Optimizing Large Language Model Inference on Hardware Accelerators

R Geens, M Shi, A Symons, C Fang… - 2024 IEEE 37th …, 2024 - ieeexplore.ieee.org
The rise of Large Language Models (LLMs) has significantly escalated the demand for
efficient LLM inference, primarily fulfilled through cloud-based GPU computing. This …