Lmsys-chat-1m: A large-scale real-world llm conversation dataset

L Zheng, WL Chiang, Y Sheng, T Li, S Zhuang… - arxiv preprint arxiv …, 2023 - arxiv.org
Studying how people interact with large language models (LLMs) in real-world scenarios is
increasingly important due to their widespread use in various applications. In this paper, we …

Towards efficient generative large language model serving: A survey from algorithms to systems

X Miao, G Oliaro, Z Zhang, X Cheng, H **… - arxiv preprint arxiv …, 2023 - arxiv.org
In the rapidly evolving landscape of artificial intelligence (AI), generative large language
models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However …

Automix: Automatically mixing language models

P Aggarwal, A Madaan, A Anand, SP Potharaju… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) are now available from cloud API providers in various sizes
and configurations. While this diversity offers a broad spectrum of choices, effectively …

Enhancing on-device llm inference with historical cloud-based llm interactions

Y Ding, C Niu, F Wu, S Tang, C Lyu… - Proceedings of the 30th …, 2024 - dl.acm.org
Many billion-scale large language models (LLMs) have been released for resource-
constraint mobile devices to provide local LLM inference service when cloud-based …

Graphrouter: A graph-based router for llm selections

T Feng, Y Shen, J You - arxiv preprint arxiv:2410.03834, 2024 - arxiv.org
The rapidly growing number and variety of Large Language Models (LLMs) present
significant challenges in efficiently selecting the appropriate LLM for a given query …

Model compression and efficient inference for large language models: A survey

W Wang, W Chen, Y Luo, Y Long, Z Lin… - arxiv preprint arxiv …, 2024 - arxiv.org
Transformer based large language models have achieved tremendous success. However,
the significant memory and computational costs incurred during the inference process make …

Cache & distil: Optimising API calls to large language models

G Ramírez, M Lindemann, A Birch, I Titov - arxiv preprint arxiv …, 2023 - arxiv.org
Large-scale deployment of generative AI tools often depends on costly API calls to a Large
Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can …

Teola: Towards end-to-end optimization of llm-based applications

X Tan, Y Jiang, Y Yang, H Xu - arxiv preprint arxiv:2407.00326, 2024 - arxiv.org
Large language model (LLM)-based applications consist of both LLM and non-LLM
components, each contributing to the end-to-end latency. Despite great efforts to optimize …

A Survey on Effective Invocation Methods of Massive LLM Services

C Wang, B Zhang, D Sui, Z Tum, X Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Language models as a service (LMaaS) enable users to accomplish tasks without requiring
specialized knowledge, simply by paying a service provider. However, numerous providers …

[HTML][HTML] Leveraging LLMs for optimised feature selection and embedding in structured data: A case study on graduate employment classification

R Haque, HN Goh, CY Ting, A Quek… - Computers and Education …, 2025 - Elsevier
Abstract The application of Machine Learning (ML) for predicting graduate student
employability is a growing area of research, driven by the need to align educational …