Mobile edge intelligence for large language models: A contemporary survey

G Qu, Q Chen, W Wei, Z Lin, X Chen… - … Surveys & Tutorials, 2025 - ieeexplore.ieee.org
On-device large language models (LLMs), referring to running LLMs on edge devices, have
raised considerable interest since they are more cost-effective, latency-efficient, and privacy …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models

GI Meadows, NWL Lau, EA Susanto, CL Yu… - arxiv preprint arxiv …, 2024 - arxiv.org
The proliferation of large language models (LLMs) requires robust evaluation of their
alignment with local values and ethical standards, especially as existing benchmarks often …

A Review on Edge Large Language Models: Design, Execution, and Applications

Y Zheng, Y Chen, B Qian, X Shi, Y Shu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have revolutionized natural language processing with their
exceptional capabilities. However, deploying LLMs on resource-constrained edge devices …

Federated full-parameter tuning of billion-sized language models with communication cost under 18 kilobytes

Z Qin, D Chen, B Qian, B Ding, Y Li, S Deng - arxiv preprint arxiv …, 2023 - arxiv.org
Pre-trained large language models (LLMs) require fine-tuning to improve their
responsiveness to natural language instructions. Federated learning (FL) offers a way to …

Llm for mobile: An initial roadmap

D Chen, Y Liu, M Zhou, Y Zhao, H Wang… - ACM Transactions on …, 2024 - dl.acm.org
When mobile meets LLMs, mobile app users deserve to have more intelligent usage
experiences. For this to happen, we argue that there is a strong need to apply LLMs for the …

Efficient training and inference: Techniques for large language models using llama

SR Cunningham, D Archambault, A Kung - Authorea Preprints, 2024 - techrxiv.org
To enhance the efficiency of language models, it would involve optimizing their training and
inference processes to reduce computational demands while maintaining high performance …

VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

Y Li, S Han, S Ji - arxiv preprint arxiv:2405.15179, 2024 - arxiv.org
As the adoption of large language models increases and the need for per-user or per-task
model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low …

AnyMatch--Efficient Zero-Shot Entity Matching with a Small Language Model

Z Zhang, P Groth, I Calixto, S Schelter - arxiv preprint arxiv:2409.04073, 2024 - arxiv.org
Entity matching (EM) is the problem of determining whether two records refer to same real-
world entity, which is crucial in data integration, eg, for product catalogs or address …

An LPDDR-based CXL-PNM Platform for TCO-efficient Inference of Transformer-based Large Language Models

SS Park, KS Kim, J So, J Jung, J Lee… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Transformer-based large language models (LLMs) such as Generative Pre-trained
Transformer (GPT) have become popular due to their remarkable performance across …