A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
Flexgen: High-throughput generative inference of large language models with a single gpu
The high computational and memory requirements of large language model (LLM) inference
make it feasible only with multiple high-end accelerators. Motivated by the emerging …
make it feasible only with multiple high-end accelerators. Motivated by the emerging …
Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …
their high memory demands and computational costs. While parameter-efficient fine-tuning …
Lut-gemm: Quantized matrix multiplication based on luts for efficient inference in large-scale generative language models
The recent advancements in self-supervised learning, combined with the Transformer
architecture, have enabled natural language processing (NLP) to achieve remarkably low …
architecture, have enabled natural language processing (NLP) to achieve remarkably low …
Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning
We propose a simple approach for memory-efficient adaptation of pretrained language
models. Our approach uses an iterative algorithm to decompose each pretrained matrix into …
models. Our approach uses an iterative algorithm to decompose each pretrained matrix into …
A comprehensive survey of compression algorithms for language models
How can we compress language models without sacrificing accuracy? The number of
compression algorithms for language models is rapidly growing to benefit from remarkable …
compression algorithms for language models is rapidly growing to benefit from remarkable …
[PDF][PDF] NOLA: Networks as linear combination of low rank random basis
ABSTRACT Large Language Models (LLMs) have recently gained popularity due to their
impressive few-shot performance across various downstream tasks. However, fine-tuning all …
impressive few-shot performance across various downstream tasks. However, fine-tuning all …
[HTML][HTML] LLM-Commentator: Novel fine-tuning strategies of large language models for automatic commentary generation using football event data
A Cook, O Karakuş - Knowledge-Based Systems, 2024 - Elsevier
Real-time commentary on football matches is a challenging task that requires precise and
coherent descriptions of events as they unfold. Traditional methods often fall short in …
coherent descriptions of events as they unfold. Traditional methods often fall short in …
Model compression and efficient inference for large language models: A survey
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …
the significant memory and computational costs incurred during the inference process make …
Neo: Saving gpu memory crisis with cpu offloading for online llm inference
Online LLM inference powers many exciting applications such as intelligent chatbots and
autonomous agents. Modern LLM inference engines widely rely on request batching to …
autonomous agents. Modern LLM inference engines widely rely on request batching to …