Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Kimi k1. 5: Scaling reinforcement learning with llms
K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …
compute but is limited to the amount of available training data. Scaling reinforcement …
Instinfer: In-storage attention offloading for cost-effective long-context llm inference
The widespread of Large Language Models (LLMs) marks a significant milestone in
generative AI. Nevertheless, the increasing context length and batch size in offline LLM …
generative AI. Nevertheless, the increasing context length and batch size in offline LLM …
Preble: Efficient distributed prompt scheduling for llm serving
Prompts to large language models (LLMs) have evolved beyond simple user questions. For
LLMs to solve complex problems, today's practices are to include domain-specific …
LLMs to solve complex problems, today's practices are to include domain-specific …
Batchllm: Optimizing large batched llm inference with global prefix sharing and throughput-oriented token batching
Large language models (LLMs) increasingly play an important role in a wide range of
information processing and management tasks. Many of these tasks are performed in large …
information processing and management tasks. Many of these tasks are performed in large …
Expertflow: Optimized expert activation and token allocation for efficient mixture-of-experts inference
Sparse Mixture of Experts (MoE) models, while outperforming dense Large Language
Models (LLMs) in terms of performance, face significant deployment challenges during …
Models (LLMs) in terms of performance, face significant deployment challenges during …
Layerkv: Optimizing large language model serving with layer-wise kv cache management
Y **ong, H Wu, C Shao, Z Wang, R Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The expanding context windows in large language models (LLMs) have greatly enhanced
their capabilities in various applications, but they also introduce significant challenges in …
their capabilities in various applications, but they also introduce significant challenges in …
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
We introduce Lexico, a novel KV cache compression method that leverages sparse coding
with a universal dictionary. Our key finding is that key-value cache in modern LLMs can be …
with a universal dictionary. Our key finding is that key-value cache in modern LLMs can be …
Context Parallelism for Scalable Million-Token Inference
A Yang, J Yang, A Ibrahim, X **e, B Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present context parallelism for long-context large language model inference, which
achieves near-linear scaling for long-context prefill latency with up to 128 H100 GPUs …
achieves near-linear scaling for long-context prefill latency with up to 128 H100 GPUs …
Tackling the dynamicity in a production llm serving system with sota optimizations via hybrid prefill/decode/verify scheduling on efficient meta-kernels
M Song, X Tang, F Hou, J Li, W Wei, Y Ma… - arxiv preprint arxiv …, 2024 - arxiv.org
Meeting growing demands for low latency and cost efficiency in production-grade large
language model (LLM) serving systems requires integrating advanced optimization …
language model (LLM) serving systems requires integrating advanced optimization …
LLM Knowledge-Driven Target Prototype Learning for Few-Shot Segmentation
P Li, F Liu, L Jiao, S Li, X Liu, P Chen, L Li… - Knowledge-Based …, 2025 - Elsevier
Abstract Few-Shot Segmentation (FSS) aims to segment new class objects in a query image
with few support images. The prototype-based FSS methods first model a target prototype …
with few support images. The prototype-based FSS methods first model a target prototype …