Google Наука

G Qu, Q Chen, W Wei, Z Lin, X Chen… - … Surveys & Tutorials, 2025 - ieeexplore.ieee.org

On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …

Запазване Позоваване С позовавания в 31 Сродни статии Всички 5 версии

Foundation models in smart agriculture: Basics, opportunities, and challenges

J Li, M Xu, L **ang, D Chen, W Zhuang, X Yin… - … and Electronics in …, 2024 - Elsevier

The past decade has witnessed the rapid development and adoption of machine and deep
learning (ML & DL) methodologies in agricultural systems, showcased by great successes in …

Запазване Позоваване С позовавания в 31 Сродни статии Всички 3 версии

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - Transactions of the Association for …, 2024 - direct.mit.edu

Abstract Large Language Models (LLMs) have transformed natural language processing
tasks successfully. Yet, their large size and high computational needs pose challenges for …

Запазване Позоваване С позовавания в 257 Сродни статии Всички 4 версии

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Kvquant: Towards 10 million context length llm inference with kv cache quantization

C Hooper, S Kim, H Mohammadzadeh… - Advances in …, 2025 - proceedings.neurips.cc

LLMs are seeing growing use for applications which require large context windows, and with
these large context windows KV cache activations surface as the dominant contributor to …

Запазване Позоваване С позовавания в 120 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Omniquant: Omnidirectionally calibrated quantization for large language models

W Shao, M Chen, Z Zhang, P Xu, L Zhao, Z Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs) have revolutionized natural language processing tasks.
However, their practical deployment is hindered by their immense memory and computation …

Запазване Позоваване С позовавания в 202 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Medusa: Simple llm inference acceleration framework with multiple decoding heads

T Cai, Y Li, Z Geng, H Peng, JD Lee, D Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) employ auto-regressive decoding that requires sequential
computation, with each step reliant on the previous one's output. This creates a bottleneck …

Запазване Позоваване С позовавания в 180 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Mobilellm: Optimizing sub-billion parameter language models for on-device use cases

Z Liu, C Zhao, F Iandola, C Lai, Y Tian… - … on Machine Learning, 2024 - openreview.net

This paper addresses the growing need for efficient large language models (LLMs) on
mobile devices, driven by increasing cloud costs and latency concerns. We focus on …

Запазване Позоваване С позовавания в 83 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Запазване Позоваване С позовавания в 102 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Kivi: A tuning-free asymmetric 2bit quantization for kv cache

Z Liu, J Yuan, H **, S Zhong, Z Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

Efficiently serving large language models (LLMs) requires batching of many requests to
reduce the cost per request. Yet, with larger batch sizes and longer context lengths, the key …

Запазване Позоваване С позовавания в 117 Сродни статии Всички 9 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org

In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Запазване Позоваване С позовавания в 77 Сродни статии Всички 2 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Squeezellm: Dense-and-sparse quantization

Mobile edge intelligence for large language models: A contemporary survey

Foundation models in smart agriculture: Basics, opportunities, and challenges

A survey on model compression for large language models

Kvquant: Towards 10 million context length llm inference with kv cache quantization

Omniquant: Omnidirectionally calibrated quantization for large language models

Medusa: Simple llm inference acceleration framework with multiple decoding heads

Mobilellm: Optimizing sub-billion parameter language models for on-device use cases

A survey of resource-efficient llm and multimodal foundation models

Kivi: A tuning-free asymmetric 2bit quantization for kv cache

Towards efficient generative large language model serving: A survey from algorithms to systems