Efficient large language models: A survey
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …
tasks such as natural language understanding and language generation, and thus have the …
Quip: 2-bit quantization of large language models with guarantees
This work studies post-training parameter quantization in large language models (LLMs).
We introduce quantization with incoherence processing (QuIP), a new method based on the …
We introduce quantization with incoherence processing (QuIP), a new method based on the …
A survey of safety and trustworthiness of large language models through the lens of verification and validation
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …
engage end-users in human-level conversations with detailed and articulate answers across …
Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization
Large language models (LLMs) face the challenges in fine-tuning and deployment due to
their high memory demands and computational costs. While parameter-efficient fine-tuning …
their high memory demands and computational costs. While parameter-efficient fine-tuning …
Llmlingua: Compressing prompts for accelerated inference of large language models
Large language models (LLMs) have been applied in various applications due to their
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) …
astonishing capabilities. With advancements in technologies such as chain-of-thought (CoT) …
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …
(LLM) serving with tree-based speculative inference and verification. The key idea behind …
[PDF][PDF] Efficiently Programming Large Language Models using SGLang.
Large language models (LLMs) are increasingly used for complex tasks that require multiple
generation calls, advanced prompting techniques, control flow, and structured …
generation calls, advanced prompting techniques, control flow, and structured …
Netllm: Adapting large language models for networking
Many networking tasks now employ deep learning (DL) to solve complex prediction and
optimization problems. However, current design philosophy of DL-based algorithms entails …
optimization problems. However, current design philosophy of DL-based algorithms entails …
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks
Abstract Large Language Models (LLMs) have shown promising performance in knowledge-
intensive reasoning tasks that require a compound understanding of knowledge. However …
intensive reasoning tasks that require a compound understanding of knowledge. However …
Towards efficient generative large language model serving: A survey from algorithms to systems
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …