On-device language models: A comprehensive review
The advent of large language models (LLMs) revolutionized natural language processing
applications, and running LLMs on edge devices has become increasingly attractive for …
applications, and running LLMs on edge devices has become increasingly attractive for …
[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …
question answering, and reasoning, facilitating various tasks and domains. Despite their …
D-llm: A token adaptive computing resource allocation strategy for large language models
Y Jiang, H Wang, L **e, H Zhao… - Advances in Neural …, 2025 - proceedings.neurips.cc
Large language models have shown an impressive societal impact owing to their excellent
understanding and logical reasoning skills. However, such strong ability relies on a huge …
understanding and logical reasoning skills. However, such strong ability relies on a huge …
Instinfer: In-storage attention offloading for cost-effective long-context llm inference
The widespread of Large Language Models (LLMs) marks a significant milestone in
generative AI. Nevertheless, the increasing context length and batch size in offline LLM …
generative AI. Nevertheless, the increasing context length and batch size in offline LLM …
Wip: Efficient llm prefilling with mobile npu
Large language models (LLMs) play a crucial role in various Natural Language Processing
(NLP) tasks, prompting their deployment on mobile devices for inference. However, a …
(NLP) tasks, prompting their deployment on mobile devices for inference. However, a …
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
Disaggregating the prefill and decoding phases represents an effective new paradigm for
generative inference of large language models (LLM), which eliminates prefill-decoding …
generative inference of large language models (LLM), which eliminates prefill-decoding …
Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
Both the training and use of Large Language Models (LLMs) require large amounts of
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …
Hybrid Heterogeneous Clusters Can Lower the Energy Consumption of LLM Inference Workloads
Both the training and use of Large Language Models (LLMs) require large amounts of
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …
energy. Their increasing popularity, therefore, raises critical concerns regarding the energy …
[PDF][PDF] Online Workload Allocation and Energy Optimization in Large Language Model Inference Systems
G Wilkins - 2024 - grantwilkins.github.io
The rapid adoption of Large Language Models (LLMs) has furthered natural language
processing and helped text generation, question answering, and sentiment analysis …
processing and helped text generation, question answering, and sentiment analysis …
Smart QoS-Aware Resource Management For Edge Intelligence Systems
M Hosseinzadeh - uknowledge.uky.edu
There are several definitions for Smart Cities. One common key point of these definitions is
that smart cities are technologically advanced cities which connect everything in a complex …
that smart cities are technologically advanced cities which connect everything in a complex …