Google Academic

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

Salvați Citați Citat de 102 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

A survey on scheduling techniques in computing and network convergence

S Tang, Y Yu, H Wang, G Wang, W Chen… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The computing demand for massive applications has led to the ubiquitous deployment of
computing power. This trend results in the urgent need for higher-level computing resource …

Salvați Citați Citat de 11 ori Articole cu conținut similar Toate cele 2 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] caidongqi.com

Resource-efficient algorithms and systems of foundation models: A survey

M Xu, D Cai, W Yin, S Wang, X **, X Liu - ACM Computing Surveys, 2025 - dl.acm.org

Large foundation models, including large language models, vision transformers, diffusion,
and large language model based multimodal models, are revolutionizing the entire machine …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 3 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Efficient training of large language models on distributed infrastructures: a survey

J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) like GPT and LLaMA are revolutionizing the AI industry with
their sophisticated capabilities. Training these models requires vast GPU clusters and …

Salvați Citați Citat de 9 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] wisc.edu

Tale of two cs: Computation vs. communication scaling for future transformers on future hardware

S Pati, S Aga, M Islam, N Jayasena… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

Scaling neural network models has delivered dramatic quality gains across ML problems.
However, this scaling also increased the reliance on efficient distributed training techniques …

Salvați Citați Citat de 12 ori Articole cu conținut similar Toate cele 4 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Training and serving system of foundation models: A comprehensive survey

J Zhou, Y Chen, Z Hong, W Chen, Y Yu… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

Foundation models (eg, ChatGPT, DALL-E, PengCheng Mind, PanGu-) have demonstrated
extraordinary performance in key technological areas, such as natural language processing …

Salvați Citați Citat de 10 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fast state restoration in LLM serving with hcache

S Gao, Y Chen, J Shu - arxiv preprint arxiv:2410.05004, 2024 - arxiv.org

The growing complexity of LLM usage today, eg, multi-round conversation and retrieval-
augmented generation (RAG), makes contextual states (ie, KV cache) reusable across user …

Salvați Citați Citat de 5 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference

J Du, J Wei, J Jiang, S Cheng, D Huang… - Proceedings of the 29th …, 2024 - dl.acm.org

Distributed large model inference is still in a dilemma where balancing cost and effect. The
online scenarios demand intraoperator parallelism to achieve low latency and intensive …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 2 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring the performance and efficiency of transformer models for NLP on mobile devices

I Panopoulos, S Nikolaidis, SI Venieris… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org

Deep learning (DL) is characterised by its dynamic nature, with new deep neural network
(DNN) architectures and approaches emerging every few years, driving the field's …

Salvați Citați Citat de 7 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ProTrain: Efficient LLM Training via Memory-Aware Techniques

H Yang, J Zhou, Y Fu, X Wang, R Roane… - arxiv preprint arxiv …, 2024 - arxiv.org

It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem,
existing work exploits the combination of CPU and GPU for the training process, such as …

Salvați Citați Citat de 2 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Mobius: Fine tuning large-scale models on commodity gpu servers

A survey of resource-efficient llm and multimodal foundation models

A survey on scheduling techniques in computing and network convergence

Resource-efficient algorithms and systems of foundation models: A survey

Efficient training of large language models on distributed infrastructures: a survey

Tale of two cs: Computation vs. communication scaling for future transformers on future hardware

Training and serving system of foundation models: A comprehensive survey

Fast state restoration in LLM serving with hcache

Liger: Interleaving Intra-and Inter-Operator Parallelism for Distributed Large Model Inference

Exploring the performance and efficiency of transformer models for NLP on mobile devices

ProTrain: Efficient LLM Training via Memory-Aware Techniques