Google 학술 검색

{InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management

W Lee, J Lee, J Seo, J Sim - 18th USENIX Symposium on Operating …, 2024 - usenix.org

Transformer-based large language models (LLMs) demonstrate impressive performance
across various natural language processing tasks. Serving LLM inference for generating …

저장 인용 43회 인용 관련 학술자료 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

저장 인용 30회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

S Rajbhandari, O Ruwase, J Rasley, S Smith… - Proceedings of the …, 2021 - dl.acm.org

In the last three years, the largest dense deep learning models have grown over 1000x to
reach hundreds of billions of parameters, while the GPU memory has only grown by 5x (16 …

저장 인용 348회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{Zero-offload}: Democratizing {billion-scale} model training

J Ren, S Rajbhandari, RY Aminabadi… - 2021 USENIX Annual …, 2021 - usenix.org

Large-scale model training has been a playing ground for a limited few requiring complex
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …

저장 인용 399회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

Z Lin, Y Miao, Q Zhang, F Yang, Y Zhu, C Li… - … USENIX Symposium on …, 2024 - usenix.org

With the growing model size of deep neural networks (DNN), deep learning training is
increasingly relying on handcrafted search spaces to find efficient parallelization execution …

저장 인용 8회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Deep learning-based natural language processing in human-agent interaction: Applications, advancements and challenges

N Ahmed, AK Saha, MA Al Noman, JR Jim… - Natural Language …, 2024 - Elsevier

Abstract Human-Agent Interaction is at the forefront of rapid development, with integrating
deep learning techniques into natural language processing representing significant …

저장 인용 3회 인용 관련 학술자료

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

W **ao, S Ren, Y Li, Y Zhang, P Hou, Z Li… - … USENIX Symposium on …, 2020 - usenix.org

Efficiently scheduling deep learning jobs on large-scale GPU clusters is crucial for job
performance, system throughput, and hardware utilization. It is getting ever more …

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

POET: Training neural networks on tiny devices with integrated rematerialization and paging

SG Patil, P Jain, P Dutta, I Stoica… - … on Machine Learning, 2022 - proceedings.mlr.press

Fine-tuning models on edge devices like mobile phones would enable privacy-preserving
personalization over sensitive data. However, edge training has historically been limited to …

저장 인용 50회 인용 관련 학술자료 전체 13개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Melon: Breaking the memory wall for resource-efficient on-device machine learning

Q Wang, M Xu, C **, X Dong, J Yuan, X **… - Proceedings of the 20th …, 2022 - dl.acm.org

On-device learning is a promising technique for emerging privacy-preserving machine
learning paradigms. However, through quantitative experiments, we find that commodity …

저장 인용 61회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

BPIPE: memory-balanced pipeline parallelism for training large language models

T Kim, H Kim, GI Yu, BG Chun - International Conference on …, 2023 - proceedings.mlr.press

Pipeline parallelism is a key technique for training large language models within GPU
clusters. However, it often leads to a memory imbalance problem, where certain GPUs face …

저장 인용 26회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Capuchin: Tensor-based gpu memory management for deep learning

{InfiniGen}: Efficient generative inference of large language models with dynamic {KV} cache management

Enabling resource-efficient aiot system with cross-level optimization: A survey

Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning

{Zero-offload}: Democratizing {billion-scale} model training

{nnScaler}:{Constraint-Guided} Parallelization Plan Generation for Deep Learning Training

[HTML][HTML] Deep learning-based natural language processing in human-agent interaction: Applications, advancements and challenges

{AntMan}: Dynamic scaling on {GPU} clusters for deep learning

POET: Training neural networks on tiny devices with integrated rematerialization and paging

Melon: Breaking the memory wall for resource-efficient on-device machine learning

BPIPE: memory-balanced pipeline parallelism for training large language models