Google Akademik

G Qu, Q Chen, W Wei, Z Lin, X Chen… - … Surveys & Tutorials, 2025 - ieeexplore.ieee.org

On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …

Kaydet Alıntı yap Alıntılanma sayısı: 27 İlgili makaleler 5 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Tool learning with large language models: A survey

C Qu, S Dai, X Wei, H Cai, S Wang, D Yin, J Xu… - Frontiers of Computer …, 2025 - Springer

Recently, tool learning with large language models (LLMs) has emerged as a promising
paradigm for augmenting the capabilities of LLMs to tackle highly complex problems …

Kaydet Alıntı yap Alıntılanma sayısı: 50 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - arxiv preprint arxiv …, 2023 - arxiv.org

This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

Kaydet Alıntı yap Alıntılanma sayısı: 124 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] acm.org

Spotserve: Serving generative large language models on preemptible instances

X Miao, C Shi, J Duan, X **, D Lin, B Cui… - Proceedings of the 29th …, 2024 - dl.acm.org

The high computational and memory requirements of generative large language models
(LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary …

Kaydet Alıntı yap Alıntılanma sayısı: 48 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]

[PDF] acm.org

Efficient and green large language models for software engineering: Vision and the road ahead

J Shi, Z Yang, D Lo - ACM Transactions on Software Engineering and …, 2024 - dl.acm.org

Large Language Models (LLMs) have recently shown remarkable capabilities in various
software engineering tasks, spurring the rapid growth of the Large Language Models for …

Kaydet Alıntı yap Alıntılanma sayısı: 11 İlgili makaleler 2 sürümün hepsi

[Free GPT-4]

[PDF] arxiv.org

Break the sequential dependency of llm inference using lookahead decoding

Y Fu, P Bailis, I Stoica, H Zhang - arxiv preprint arxiv:2402.02057, 2024 - arxiv.org

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded,
resulting in high latency and significant wastes of the parallel processing power of modern …

Kaydet Alıntı yap Alıntılanma sayısı: 66 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

From decoding to meta-generation: Inference-time algorithms for large language models

S Welleck, A Bertsch, M Finlayson… - arxiv preprint arxiv …, 2024 - arxiv.org

One of the most striking findings in modern research on large language models (LLMs) is
that scaling up compute during training leads to better results. However, less attention has …

Kaydet Alıntı yap Alıntılanma sayısı: 14 İlgili makaleler 3 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Large language models and games: A survey and roadmap

R Gallotta, G Todd, M Zammit, S Earle, A Liapis… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent years have seen an explosive increase in research on large language models
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …

Kaydet Alıntı yap Alıntılanma sayısı: 62 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] arxiv.org

Llm inference serving: Survey of recent advances and opportunities

B Li, Y Jiang, V Gadepally, D Tiwari - arxiv preprint arxiv:2407.12391, 2024 - arxiv.org

This survey offers a comprehensive overview of recent advancements in Large Language
Model (LLM) serving systems, focusing on research since the year 2023. We specifically …

Kaydet Alıntı yap Alıntılanma sayısı: 14 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]

[PDF] acm.org

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification

X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang… - Proceedings of the 29th …, 2024 - dl.acm.org

This paper introduces SpecInfer, a system that accelerates generative large language model
(LLM) serving with tree-based speculative inference and verification. The key idea behind …

Kaydet Alıntı yap Alıntılanma sayısı: 80 İlgili makaleler

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Towards efficient generative large language model serving: A survey from algorithms to systems

Mobile edge intelligence for large language models: A contemporary survey

Tool learning with large language models: A survey

SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification

Spotserve: Serving generative large language models on preemptible instances

Efficient and green large language models for software engineering: Vision and the road ahead

Break the sequential dependency of llm inference using lookahead decoding

From decoding to meta-generation: Inference-time algorithms for large language models

Large language models and games: A survey and roadmap

Llm inference serving: Survey of recent advances and opportunities

Specinfer: Accelerating large language model serving with tree-based speculative inference and verification