Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Performance enhancement of artificial intelligence: A survey

M Krichen, MS Abdalzaher - Journal of Network and Computer Applications, 2024 - Elsevier
The advent of machine learning (ML) and Artificial intelligence (AI) has brought about a
significant transformation across multiple industries, as it has facilitated the automation of …

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org
Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

BPIPE: memory-balanced pipeline parallelism for training large language models

T Kim, H Kim, GI Yu, BG Chun - International Conference on …, 2023 - proceedings.mlr.press
Pipeline parallelism is a key technique for training large language models within GPU
clusters. However, it often leads to a memory imbalance problem, where certain GPUs face …

Enabling Parallelism Hot Switching for Efficient Training of Large Language Models

H Ge, F Fu, H Li, X Wang, S Lin, Y Wang, X Nie… - Proceedings of the …, 2024 - dl.acm.org
Training of large-scale deep learning models necessitates parallelizing the model and data
across numerous devices, and the choice of parallelism strategy substantially depends on …

Proteus: Simulating the performance of distributed DNN training

J Duan, X Li, P Xu, X Zhang, S Yan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
DNN models are becoming increasingly larger to achieve unprecedented accuracy, and the
accompanying increased computation and memory requirements necessitate the …

Does compressing activations help model parallel training?

S Bian, D Li, H Wang, E **ng… - … of Machine Learning …, 2024 - proceedings.mlsys.org
Foundation models have superior performance across a wide array of machine learning
tasks. The training of these models typically involves model parallelism (MP) to navigate the …

Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training

S Li, Z Lai, Y Hao, W Liu, K Ge, X Deng, D Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep learning is experiencing a rise in foundation models that are expected to lead in
various fields. The massive number of parameters necessitates the use of tensor model …

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

H Luo, H Wu, H Zhou, L **ng, Y Di, J Wang… - arxiv preprint arxiv …, 2025 - arxiv.org
Although deep models have been widely explored in solving partial differential equations
(PDEs), previous works are primarily limited to data only with up to tens of thousands of …