Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Retrievalattention: Accelerating long-context llm inference via vector retrieval
Transformer-based Large Language Models (LLMs) have become increasingly important.
However, due to the quadratic time complexity of attention computation, scaling LLMs to …
However, due to the quadratic time complexity of attention computation, scaling LLMs to …
Magicpig: Lsh sampling for efficient llm generation
Large language models (LLMs) with long context windows have gained significant attention.
However, the KV cache, stored to avoid re-computation, becomes a bottleneck. Various …
However, the KV cache, stored to avoid re-computation, becomes a bottleneck. Various …
Shadowkv: Kv cache in shadows for high-throughput long-context llm inference
With the widespread deployment of long-context large language models (LLMs), there has
been a growing demand for efficient support of high-throughput inference. However, as the …
been a growing demand for efficient support of high-throughput inference. However, as the …
Squeezed attention: Accelerating long context length llm inference
Emerging Large Language Model (LLM) applications require long input prompts to perform
complex downstream tasks like document analysis and code generation. For these long …
complex downstream tasks like document analysis and code generation. For these long …
A survey on large language model acceleration based on kv cache management
Large Language Models (LLMs) have revolutionized a wide range of domains such as
natural language processing, computer vision, and multi-modal tasks due to their ability to …
natural language processing, computer vision, and multi-modal tasks due to their ability to …
Adaptlink: A heterogeneity-aware adaptive framework for distributed mllm inference
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance
in tasks such as commonsense reasoning and visual scene understanding. Despite their …
in tasks such as commonsense reasoning and visual scene understanding. Despite their …
Data Proportion Detection for Optimized Data Management for Large Language Models
Large language models (LLMs) have demonstrated exceptional performance across a wide
range of tasks and domains, with data preparation playing a critical role in achieving these …
range of tasks and domains, with data preparation playing a critical role in achieving these …
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
KV cache quantization can improve Large Language Models (LLMs) inference throughput
and latency in long contexts and large batch-size scenarios while preserving LLMs …
and latency in long contexts and large batch-size scenarios while preserving LLMs …
AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference
With the development of large language models (LLMs), efficient inference through Key-
Value (KV) cache compression has attracted considerable attention, especially for long …
Value (KV) cache compression has attracted considerable attention, especially for long …
XKV: Personalized KV Cache Memory Reduction for Long-Context LLM Inference
Recently the generative Large Language Model (LLM) has achieved remarkable success in
numerous applications. Notably its inference generates output tokens one-by-one, leading …
numerous applications. Notably its inference generates output tokens one-by-one, leading …