[PDF][PDF] Mobile Foundation Model as Firmware

J Yuan, C Yang, D Cai, S Wang, X Yuan… - arxiv preprint arxiv …, 2023 - caidongqi.com
In today's landscape, smartphones have evolved into hubs for hosting a multitude of deep
learning models aimed at local execution. A key realization driving this work is the notable …

Advances of pipeline model parallelism for deep learning training: an overview

L Guan, DS Li, JY Liang, WJ Wang, KS Ge… - Journal of Computer …, 2024 - Springer
Deep learning has become the cornerstone of artificial intelligence, playing an increasingly
important role in human production and lifestyle. However, as the complexity of problem …

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Exploring the Impact of In-Browser Deep Learning Inference on Quality of User Experience and Performance

Q Wang, S Jiang, Z Chen, X Cao, Y Li, A Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Deep Learning (DL) is increasingly being integrated into Web applications through a
method known as" in-browser inference", where the DL processes occur directly within Web …

Anatomizing Deep Learning Inference in Web Browsers

Q Wang, S Jiang, Z Chen, X Cao, Y Li, A Li… - ACM Transactions on …, 2024 - dl.acm.org
Web applications have increasingly adopted Deep Learning (DL) through in-browser
inference, wherein DL inference performs directly within Web browsers. The actual …

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

C Liao, M Sun, Z Yang, K Chen, B Yuan, F Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in large language models have brought immense value to the world, with
their superior capabilities stemming from the massive number of parameters they utilize …

AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning

Z Sun, H Cao, Y Wang, G Feng, S Chen… - Proceedings of the 29th …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated powerful capabilities, requiring huge
memory with their increasing sizes and sequence lengths, thus demanding larger parallel …

Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training

S Li, Z Lai, Y Hao, W Liu, K Ge, X Deng, D Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Deep learning is experiencing a rise in foundation models that are expected to lead in
various fields. The massive number of parameters necessitates the use of tensor model …

Commodification of compute

J Kristensen, D Wender, C Anthony - arxiv preprint arxiv:2406.19261, 2024 - arxiv.org
The rapid advancements in artificial intelligence, big data analytics, and cloud computing
have precipitated an unprecedented demand for computational resources. However, the …

Mobile Foundation Model as Firmware

J Yuan, C Yang, D Cai, S Wang, X Yuan… - Proceedings of the 30th …, 2024 - dl.acm.org
In the current AI era, mobile devices such as smartphones are tasked with executing a
myriad of deep neural networks (DNNs) locally. It presents a complex landscape, as these …