A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Tale of two cs: Computation vs. communication scaling for future transformers on future hardware

S Pati, S Aga, M Islam, N Jayasena… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Scaling neural network models has delivered dramatic quality gains across ML problems.
However, this scaling also increased the reliance on efficient distributed training techniques …

Training and serving system of foundation models: A comprehensive survey

J Zhou, Y Chen, Z Hong, W Chen, Y Yu… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
Foundation models (eg, ChatGPT, DALL-E, PengCheng Mind, PanGu-) have demonstrated
extraordinary performance in key technological areas, such as natural language processing …

Fast state restoration in LLM serving with hcache

S Gao, Y Chen, J Shu - arxiv preprint arxiv:2410.05004, 2024 - arxiv.org
The growing complexity of LLM usage today, eg, multi-round conversation and retrieval-
augmented generation (RAG), makes contextual states (ie, KV cache) reusable across user …

Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference

J Du, J Wei, J Jiang, S Cheng, D Huang… - Proceedings of the 29th …, 2024 - dl.acm.org
Distributed large model inference is still in a dilemma where balancing cost and effect. The
online scenarios demand intraoperator parallelism to achieve low latency and intensive …

Exploring the performance and efficiency of transformer models for NLP on mobile devices

I Panopoulos, S Nikolaidis, SI Venieris… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org
Deep learning (DL) is characterised by its dynamic nature, with new deep neural network
(DNN) architectures and approaches emerging every few years, driving the field's …

ProTrain: Efficient LLM Training via Memory-Aware Techniques

H Yang, J Zhou, Y Fu, X Wang, R Roane… - arxiv preprint arxiv …, 2024 - arxiv.org
It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem,
existing work exploits the combination of CPU and GPU for the training process, such as …