„Google“ mokslinčius

H Zhang, PN Kung, M Yoshida… - Advances in …, 2025 - proceedings.neurips.cc

Despite the success of Large Language Models (LLMs) on various tasks following human
instructions, controlling model generation to follow strict constraints at inference time poses …

Išsaugoti Cituoti Cituoja 6 Susiję straipsniai Visos 9 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimizing instructions and demonstrations for multi-stage language model programs

K Opsahl-Ong, MJ Ryan, J Purtell, D Broman… - arxiv preprint arxiv …, 2024 - arxiv.org

Language Model Programs, ie sophisticated pipelines of modular language model (LM)
calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly …

Išsaugoti Cituoti Cituoja 22 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kimi k1. 5: Scaling reinforcement learning with llms

K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …

Išsaugoti Cituoti Cituoja 14 Susiję straipsniai Visos 4 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stateful large language model serving with pensieve

L Yu, J Lin, J Li - arxiv preprint arxiv:2312.05516, 2023 - arxiv.org

Large Language Models (LLMs) are wildly popular today and it is important to serve them
efficiently. Existing LLM serving systems are stateless across requests. Consequently, when …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

vattention: Dynamic memory management for serving llms without pagedattention

R Prabhu, A Nayak, J Mohan, R Ramjee… - arxiv preprint arxiv …, 2024 - arxiv.org

Efficient management of GPU memory is essential for high throughput LLM inference. Prior
systems used to reserve KV-cache memory ahead-of-time that resulted in wasted capacity …

Išsaugoti Cituoti Cituoja 9 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neo: Saving gpu memory crisis with cpu offloading for online llm inference

X Jiang, Y Zhou, S Cao, I Stoica, M Yu - arxiv preprint arxiv:2411.01142, 2024 - arxiv.org

Online LLM inference powers many exciting applications such as intelligent chatbots and
autonomous agents. Modern LLM inference engines widely rely on request batching to …

Išsaugoti Cituoti Cituoja 4 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Structuredrag: Json response formatting with large language models

C Shorten, C Pierse, TB Smith, E Cardenas… - arxiv preprint arxiv …, 2024 - arxiv.org

The ability of Large Language Models (LLMs) to generate structured outputs, such as JSON,
is crucial for their use in Compound AI Systems. However, evaluating and improving this …

Išsaugoti Cituoti Cituoja 6 Susiję straipsniai Visos 2 versijos HTML kopija

User Behavior Simulation with Large Language Model-based Agents

L Wang, J Zhang, H Yang, ZY Chen, J Tang… - ACM Transactions on …, 2025 - dl.acm.org

Simulating high quality user behavior data has always been a fundamental yet challenging
problem in human-centered applications such as recommendation systems, social networks …

Išsaugoti Cituoti Cituoja 1 Susiję straipsniai

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

W Wu, Z Pan, C Wang, L Chen, Y Bai, K Fu… - arxiv preprint arxiv …, 2024 - arxiv.org

With the development of large language models (LLMs), the ability to handle longer contexts
has become a key capability for Web applications such as cross-document understanding …

Išsaugoti Cituoti Cituoja 1 Susiję straipsniai Visos 2 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design

Z Chen, Z Zhou, Y Lu, R Xu, L Pan, Z Lan - arxiv preprint arxiv …, 2024 - arxiv.org

NP-hard problem-solving traditionally relies on heuristics, but manually crafting effective
heuristics for complex problems remains challenging. While recent work like FunSearch has …

Išsaugoti Cituoti Cituoja 1 Susiję straipsniai Visos 2 versijos HTML kopija

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Sglang: Efficient execution of structured language model programs

Adaptable logical control for large language models

Optimizing instructions and demonstrations for multi-stage language model programs

Kimi k1. 5: Scaling reinforcement learning with llms

Stateful large language model serving with pensieve

vattention: Dynamic memory management for serving llms without pagedattention

Neo: Saving gpu memory crisis with cpu offloading for online llm inference

Structuredrag: Json response formatting with large language models

User Behavior Simulation with Large Language Model-based Agents

TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection

UBER: Uncertainty-Based Evolution with Large Language Models for Automatic Heuristic Design