Llm inference unveiled: Survey and roofline model insights
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …
unique blend of opportunities and challenges. Although the field has expanded and is …
Large language model inference acceleration: A comprehensive hardware perspective
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
fields, from natural language understanding to text generation. Compared to non-generative …
fields, from natural language understanding to text generation. Compared to non-generative …
Resource-efficient Algorithms and Systems of Foundation Models: A Survey
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …
and large language model based multimodal models, are revolutionizing the entire machine …
A survey on efficient inference for large language models
Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …
performance across various tasks. However, the substantial computational and memory …
New solutions on LLM acceleration, optimization, and application
Large Language Models (LLMs) have revolutionized a wide range of applications with their
strong human-like understanding and creativity. Due to the continuously growing model size …
strong human-like understanding and creativity. Due to the continuously growing model size …
Llamaf: An efficient llama2 architecture accelerator on embedded fpgas
Large language models (LLMs) have demonstrated remarkable abilities in natural language
processing. However, their deployment on resource-constrained embedded devices …
processing. However, their deployment on resource-constrained embedded devices …
Efficient training and inference: Techniques for large language models using llama
SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org
To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …
inference processes to reduce computational demands while maintaining high performance …
Edgellm: A highly efficient cpu-fpga heterogeneous edge accelerator for large language models
The rapid advancements in artificial intelligence (AI), particularly the Large Language
Models (LLMs), have profoundly affected our daily work and communication forms …
Models (LLMs), have profoundly affected our daily work and communication forms …
A survey of small language models
Small Language Models (SLMs) have become increasingly important due to their efficiency
and performance to perform various language tasks with minimal computational resources …
and performance to perform various language tasks with minimal computational resources …
FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs
Transformer neural networks (TNNs) are being applied across a widening range of
application domains, including natural language processing (NLP), machine translation, and …
application domains, including natural language processing (NLP), machine translation, and …