FACET: On-the-Fly Activation Compression for Efficient Transformer Training

S Lee, G Yun, XT Nguyen… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Training Transformer models, known for their outstanding performance in various tasks, can
be challenging due to extensive training times and substantial memory requirements. One …

Balanced Data Placement for GEMV Acceleration with Processing-In-Memory

MA Ibrahim, M Islam, S Aga - arxiv preprint arxiv:2403.20297, 2024 - arxiv.org
With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives
that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving …

PIMnast: Balanced Data Placement for GEMV Acceleration with Processing-In-Memory

MA Ibrahim, M Islam, S Aga - SC24-W: Workshops of the …, 2024 - ieeexplore.ieee.org
With unprecedented demand for generative AI (GenAI) inference, acceleration of primitives
that dominate GenAI such as general matrix-vector multiplication (GEMV) is receiving …

Cross-Stack Optimizations for Sequence-Based Models on GPUs

S Pati - 2024 - search.proquest.com
Advancements in the field of machine learning has made deep neural networks (DNNs)
ubiquitous. Their application in the domain of natural language processing (NLP) with …